TVD: A Reproducible and Multiply Aligned TV Series Dataset
نویسندگان
چکیده
We introduce a new dataset built around two TV series from different genres, The Big Bang Theory, a situation comedy and Game of Thrones, a fantasy drama. The dataset has multiple tracks extracted from diverse sources, including dialogue (manual and automatic transcripts, multilingual subtitles), crowd-sourced textual descriptions (brief episode summaries, longer episode outlines) and various metadata (speakers, shots, scenes). The paper describes the dataset and provide tools to reproduce it for research purposes provided one has legally acquired the DVD set of the series. Tools are also provided to temporally align a major subset of dialogue and description tracks, in order to combine complementary information present in these tracks for enhanced accessibility. For alignment, we consider tracks as comparable corpora and first apply an existing algorithm for aligning such corpora based on dynamic time warping and TFIDF-based similarity scores. We improve this baseline algorithm using contextual information, WordNet-based word similarity and scene location information. We report the performance of these algorithms on a manually aligned subset of the data. To highlight the interest of the database, we report a use case involving rich speech retrieval and propose other uses.
منابع مشابه
The attitude of psychologists, psychiatrists, chronic psychiatric patients and ordinary viewers toward a TV series with the main character of a psychologist
Purpose: Popular TV series and sitcoms have received different reactions from people of different classes. The present study was conducted to measure the positive and negative attitudes of psychologists, psychiatrists, chronic psychiatric patients and ordinary healthy viewers toward an Iranian TV series entitled ‘The Physicians’ Building’ whose main character was a psychologist. Mater...
متن کاملتعریف از «خود» و ساخت «دیگری»؛ مطالعه پسااستعماری سریالهای «حریم سلطان» و «الفاروق عمر»
In today’s challenging world, and so in the Middle East, TV series as the cultural products have found a extensive and effective cultural diplomacy function. Among the countries in the region, Turkey and Saudi Arabia, with TV series in various genres, have been able to stabilize their position and influence among the satellite television networks. Hence, the basic aim of this paper is to analyz...
متن کاملRudin-Osher-Fatemi Total Variation Denoising using Split Bregman
Denoising is the problem of removing noise from an image. The most commonly studied case is with additive white Gaussian noise (AWGN), where the observed noisy image f is related to the underlying true image u by f = u+ η, and η is at each point in space independently and identically distributed as a zero-mean Gaussian random variable. Total variation (TV) regularization is a technique that was...
متن کاملAn Analysis of Audiovisual Subtitling Translation Focusing on Wordplays from English into Persian in the Friends TV Series
Translation of humor and transferring its effect is one of the most challenging tasks of a translator due to the cultural clashes between the source language (SL) and the target language (TL). Accordingly, the pre- sent study aimed to specify the most frequently applied strategies in terms of Delabastita’s wordplay model used in SL and their translation strategy by Persian translators acc...
متن کاملA TVD principle and conservative TVD schemes for adaptive Cartesian grids
Modern high-resolution conservative numerical schemes (see [1–3]) are widely used for simulating flows of liquids, gases, or plasmas. They provide a robust and accurate method for capturing discontinuities, such as shock waves. A total-variation-diminishing (TVD) principle is an important element of such schemes. A TVD scheme is constructed in such a way that the total variation, TV 1⁄4 P ikUi ...
متن کامل