277 views
 owned this note
Notes from MLLP post-processing ===== Input by jsilvestre@dsic.upv.es, gogardia@vrain.upv.es, members of mllp-support@upv.es Post-process in [the MLLP interface](https://ttp.mllp.upv.es/index.php?page=videos&npage=1) the [LHCP](https://indico.cern.ch/event/856696/) lectures in the _dev_ category. ## Log of processed videos in MLLP The appended was post-processed by Amine. Other videos done by Maria, Miguel-Angel, Rene, Ruben, Thomas are in [Ruben's codimd note - requires CERN login](https://codimd.web.cern.ch/CkA_VyauS_CYqXZrqPzPQg#16112020-at-1000am). | Lecture | Video duration | Time spent to post-process | Status (done or NA) | | ---- | -------------- | ------------ | --------- | --- | | ~~[856696c64](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiJkYTJjMjFlMWM2MTNhYWY3OTEzODhmOWVhYWFhMmQ2ZDhkOTk2Njc0N2NjMzVkY2VlODA4YTBmZWM2ZWY2YTM5NjUyMzU2NDFiN2YwNWNlOSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmM2NCIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNTYyOTA1Mn0%3D)~~ [856696c64](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI1N2VjNTFkNDY1NjVhZGM5ZDkwYzVlZjkxZmVkYTc0MTZmZDgyZGJhNzlhMDg1NjY4YzIwZTE0MjgzZTk0MWM3NWJkMmYxNTgzMzg3NzkzZSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmM2NCIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNzA3MzY0OH0%3D) | 29:44 | 16/11/2020: 25 min (10 min for checking guidelines + 15 min for adjusting segments) | video vanished | | [856696c117](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiJjNWU0NDg3YzYxOWYxYmQ4MzdhZmY1ODQ5MzIyZmFiZDYxYmMxZDE3OTFiN2EzMjM5NzZkNDY2MzgzMDcyZDU4OTM3ZGIyOGJmYjVkNjhjZiIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMxMTciLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3MzE5MTh9) | 19:49 | 17/11/2020: 4 hours - 18/11/2020: 4.5 hours - total 8.5 hours | done | | [856696c138](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiJmNTg2Y2VhNjJjNTY2N2ZjZGQ4YWZlNDNjN2MyNTFhZDk0ODQ3NmIxYmVhYTUyNzMwYWFmNjRkMmQ2OTRkODg1NWI2OTA1NWRhOWFhMjc4NiIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMxMzgiLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3ODk1ODJ9) | 17:40 | 20/11/2020: 2 hours - 21/11/2020: 2 hours - 22/11/2020: 2.5 hours - total 6.5 hours| done | | [856696c228](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI4NGUwY2EwZGZjODZhYzk5N2Y2YTcyYjQyOTE0NmVkMzViMmU5OTg1MTU5ZmJiOWJlMWVlZTFjNTZjNTU3MTliNDQ3MzEwZTg3MGE4M2Q5ZSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMyMjgiLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3ODk3ODR9) | 20:52 | 23/11/2020: 6 hours | done | | [856696c31](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiIyNjg5YjkzZDM2MmFmNzczMDYzYTY2MWFmYTE3ZTlmMDZjNWE3Mjk0MDY0NjUyYWQ5ZWFkMGQ2NTUyMzIyNjljMTFhODM3NTcyMDE2NzI0MSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMzMSIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNTc4OTg2M30%3D) | 24:32 | 24/11/2020: 7.5 hours | done | | [856696c43](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI2YTlkZjM4Mjg5YTBkMmY4MGM2MzdmYjg3ZDA5YTQxNGE4ZWNlZGYxMDM3ZTcwZjg2MjhkMDY0NzhhOGIzZDYyMGUxYWE5MTFiZjg5ZTNkZCIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmM0MyIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNTc4OTkxNX0%3D) | 36:12 | 25/11/2020: 3 hours - 27/11/2020: 1 hours - 28/11/2020: 3 hours - total 7 hours| done | ## Log of translated videos in MLLP The appended was post-processed by Amine. Other videos done by Maria, Thomas are in [Ruben's codimd note - requires CERN login](https://codimd.web.cern.ch/CkA_VyauS_CYqXZrqPzPQg#16112020-at-1000am). | Lecture | Video duration | Time spent to post-process | Status (done or NA) | | ---- | -------------- | ------------ | --------- | --- | | [856696c117](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiJjNWU0NDg3YzYxOWYxYmQ4MzdhZmY1ODQ5MzIyZmFiZDYxYmMxZDE3OTFiN2EzMjM5NzZkNDY2MzgzMDcyZDU4OTM3ZGIyOGJmYjVkNjhjZiIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMxMTciLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3MzE5MTh9) | 19:49 | 30/11/2020: 6.5 hours - 01/12/2020: 0.5 hour - 05/12/2020: 1 hour - total 8 hours | done | | [856696c43](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI2YTlkZjM4Mjg5YTBkMmY4MGM2MzdmYjg3ZDA5YTQxNGE4ZWNlZGYxMDM3ZTcwZjg2MjhkMDY0NzhhOGIzZDYyMGUxYWE5MTFiZjg5ZTNkZCIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmM0MyIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNTc4OTkxNX0%3D) | 36:12 | 01/12/2020: 8 hours - 02/12/2020: 4 hours - 05/12/2020: 2.5 hours - total 14.5 hours | done | | [856696c31](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiIyNjg5YjkzZDM2MmFmNzczMDYzYTY2MWFmYTE3ZTlmMDZjNWE3Mjk0MDY0NjUyYWQ5ZWFkMGQ2NTUyMzIyNjljMTFhODM3NTcyMDE2NzI0MSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMzMSIsImxhbmciOiJlbiIsImF1dGhvcl9uYW1lIjoiQ0VSTiIsImF1dGhvcl9jb25mIjoxMDAsImV4cGlyZSI6MTYwNTc4OTg2M30%3D) | 24:32 | 07/12/2020: 8.5 hours | done | | [856696c220](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI3MjIyYWI1Nzc4MDI0ZDUzZWJjYmNiZDBkYWY2YWVkY2M4MTBlNjE0MWM2YTllMGI3NTlkOTdlMTViZjkwZTE5OWRlYTdkYjFiYWMyODkzYSIsImlkIjoibGhjcDIwMjAtdGVzdC04NTY2OTZjMjIwIiwibGFuZyI6ImVuIiwiYXV0aG9yX25hbWUiOiJDRVJOIiwiYXV0aG9yX2NvbmYiOjEwMCwiZXhwaXJlIjoxNjA2ODMyMTI1fQ%3D%3D) | 19:51 | 08/12/2020: 7 hours | done | | [856696c228](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiI4NGUwY2EwZGZjODZhYzk5N2Y2YTcyYjQyOTE0NmVkMzViMmU5OTg1MTU5ZmJiOWJlMWVlZTFjNTZjNTU3MTliNDQ3MzEwZTg3MGE4M2Q5ZSIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMyMjgiLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3ODk3ODR9) | 20:52 | 08/12/2020: 45 minutes - 09/12/2020: 3 hours - 10/12/2020: 1 hour - 12/12/2020: 2 hours - 14/12/2020: 2.75 hours - total 9 hours| done | | [856696c138](https://ttp.mllp.upv.es/player/?request=eyJhcGlfdXNlciI6ImNlcm4iLCJhdXRob3JfaWQiOiJjZXJuIiwicmVxdWVzdF9rZXkiOiJmNTg2Y2VhNjJjNTY2N2ZjZGQ4YWZlNDNjN2MyNTFhZDk0ODQ3NmIxYmVhYTUyNzMwYWFmNjRkMmQ2OTRkODg1NWI2OTA1NWRhOWFhMjc4NiIsImlkIjoibGhjcDIwMjAtZGV2LTg1NjY5NmMxMzgiLCJsYW5nIjoiZW4iLCJhdXRob3JfbmFtZSI6IkNFUk4iLCJhdXRob3JfY29uZiI6MTAwLCJleHBpcmUiOjE2MDU3ODk1ODJ9) | 17:40 | 14/12/2020: 3 hours - 15/12/2020: 4.30 hours - total 7.30 hours | done | ## Observations with the MLLP interface 1. In the TLP Subtitle editor interface the _Layout_ option _Horizontal_ may seem rather vertical - video on the left, text on the right. Still it makes sense to be called _Horizontal_ because both elements (video, text boxes) are displayed in the same row; in contrast with the _Vertical_ layout, in which both elements are displayed in the same column. 2. The classification of a video in the MLLP _dev_ as opposed to _test_ group was done randomly but ensuring the compliance of two constraints: _dev_ and _test_ sets should have similar length (5.8h vs 5.9h), and similar speaker gender distribution (9 Male / 4 Female vs 9 Male / 5 female). 3. Behaviour the user experiences and technical explanation below by the MLLP experts: 1. When you edit your video and _save_ your modifications, the timestamp of each save is not reflected in the _Edit History_. For a new entry in the _Edit history_ to appear you have to _reload_ the page. 2. Sometimes, after you _Save_ your changes, with or without _Reload_, it takes a while for the MLLP platform to update its state. As a result, for a short while, if you come back to the video, it will show the subtitles **before** your last edit and _Save_ (even though your edit has been saved correctly, and it will eventually show up in the video). For poinnts 3.1, 3.2 above, one needs to know how the platform works internally: * The TTP player works with _editing sessions_. An editing session starts when the user opens the TTP player with a particular media object. We account for that as the TTP player performs a /speech/start_session call to our backend API. * During the session, the user saves her/his work from time to time, pressing "Save changes". These changes are not saved into the actual subtitles file(s), but are "buffered" - combined with the previous saves - in the database record corresponding to that editing session. These saves are sent to our backend API as the TTP player performs a /speech/mod call on every "Save changes" button click. * Finally, when the user finishes her/his work and decides to leave the TTP player, she/he can A) close the corresponding browser's tab/window, or B) select the "Exit" option in the left menu. * After closing the editing session, all edits are committed to the actual subtitle files and, if the editor has enough privileges (by default she/he does have), are made "public", this is, available for other users, either editors in TTP player, or viewers via your institutional video player (Paella?), properly linked to our API (see /speech/get call). * However, at this last point is where your concerns are related (3.2). * The backend needs to receive an API call to /speech/end_session endpoint, in order to realise that the corresponding editing session has actually ended. * It must be noted that the API calls to our backend made by the TTP Player are asynchronous (mandatory if we want to ensure an acceptable user experience). * Due to the way web browsers are implemented, and given the asynchronousness of the API calls of the TTP Player: either happening A) or B), there is no guarantee that this /speech/end_session API call is ever going to be executed by the browser. It is very likely that the tab/window will be closed before the API call is performed. Sometimes it does, sometimes it does not. From our experience, it should be something like 25% (yes) / 75% (no) chances. * Hence, to overcome this nondeterministic /end_session API call issue, the platform runs a daemon that, from time to time, inspects all open editing sessions, and, if it detects an editing session with no activity, it automatically closes it, allowing all subtitle changes to be committed and published. In normal conditions, these "zombie" editing sessions should be automatically closed after 5 minutes of the user having closed the TTP Player. * This daemon is also the one responsible for automatically computing WER / BLEU values of user edits against the former transcription/translation, among other tasks. Why edit sessions? We do not want to commit and publish directly partial changes to the subtitles until the reviewer finishes the whole process. Also, for users who are not editors or have no rights on the media, we do not want their changes to be public directly; instead, we let the owner of the media or an editor to review such changes, and eventually, accept or discard them. This is the approach we adopted since TLP v2.1.0, July 2015. Currently, TTP is based in TLP v3.8.0, April 2020. The above is the answer to 3.1. The "Edit history" option only shows _Editing Sessions_ records, where each one comprises all "Save changes" clicks done under that particular editing session. There is no granularity to the partial changes level. 4. If, during post-processing, the editor moves the text above the "sound waves", there is no effect on the WER calculation. ![](https://codimd.web.cern.ch/uploads/upload_f63010552e894f2105c2950b0336087c.png) ![](https://codimd.web.cern.ch/uploads/upload_17dbab32167d10610a7d013ad43babe1.png) 5. The appended image shows how a user finds the WER value: By clicking on the 3 dots and _Video Information_? WER is defined as the minimum number of elementary edit operations (word substitutions, insertions and deletions) that are needed, in this case, to correct the former transcription in order to match your revised version, normalized by the total number words of your revised transcription. Intuitively, it can be understood as the percentage of words from the former transcription that have to be amended to get the correct (your) transcription. For example, if the former transcription needs 30 elementary editing operations to match your amended transcription with a length of 200 words, then the WER will be 15%. ![](https://codimd.web.cern.ch/uploads/upload_1ea125b4bf7da524c1486813c6a789b0.png)