Efficiency of Speech Alignment for Semi-automated Subtitling in Dutch

نویسندگان

  • Patrick Wambacq
  • Kris Demuynck
چکیده

This paper describes the use of speech alignment to aid in the process of subtitling Dutch TV programs. The recognizer aligns the audio stream with an existing transcript. The goal is therefore not to transcribe but to generate the correct timing of every word. The system performs subtasks such as audio segmentation, transcript preprocessing, alignment and subtitle compression. The result is not perfect but good enough to gain efficiency when used by a professional subtitler as a starting point to refine and finalize the subtitles. In our tests, considerable time savings of 47 to 53% on average are obtained, such that the generation of subtitles for a 1 hour program, is lowered from between 4 and 7 hours to between 2.5 and 4 hours. This is all the more important in the context of an increased pressure from user groups on governments and broadcasters to reach 100% subtitled TV programs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

STON: Efficient Subtitling in Dutch Using State-of-the-Art Tools

We present a modular video subtitling platform that integrates speech/non-speech segmentation, speaker diarisation, language identification, Dutch speech recognition with state-of-the-art acoustic models and language models optimised for efficient subtitling, appropriate preand postprocessing of the data and alignment of the final result with the video fragment. Moreover, the system is able to ...

متن کامل

Real-time live broadcast news subtitling system for Spanish

Subtitling of live broadcast news is a very important application to meet the needs of deaf and hard of hearing people. However, live subtitling is a high cost operation in terms of qualification human resources and thus, money if high precision is desired. Automatic Speech Recognition researchers can help to perform this task saving both time and money developing systems that delivers subtitle...

متن کامل

Cost Function Modelling for Semi-automated SC, RTG and Automated and Semi-automated RMG Container Yard Operating Systems

This study analyses the concept of cost functions for semi-automated Straddle Carrier (SC), Rubber Tyred Gantry (RTG) and automated Rail Mounted Gantry (RMG) container yard operating cranes. It develops a generic cost based model for a pair-wise comparison, analysis and evaluation of economic efficiency and effectiveness of container yard equipment to be used for decision-making by terminal pla...

متن کامل

Language Models of Spoken Dutch

In Flanders, all TV shows are subtitled. However, the process of subtitling is a very time-consuming one and can be sped up by providing the output of a speech recognizer run on the audio of the TV show, prior to the subtitling. Naturally, this speech recognition will perform much better if the employed language model is adapted to the register and the topic of the program. We present several l...

متن کامل

SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling

This paper describes the data collection, annotation and sharing activities carried out within the FP7 EU-funded SAVAS project. The project aims to collect, share and reuse audiovisual language resources from broadcasters and subtitling companies to develop large vocabulary continuous speech recognisers in specific domains and new languages, with the purpose of solving the automated subtitling ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011