Unsupervised Vocabulary Selection for Domain-Independent Simultaneous Lecture Translation
نویسندگان
چکیده
In this work, we investigate methods to automatically adapt our simultaneous lecture translation systems to the diverse topics that occur in educational lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates lecture-specific models and vocabularies based on the resulting documents. In this paper, we propose a novel method for vocabulary selection, a critical aspect of simultaneous translation systems where the occurrence of out-of-vocabulary words significantly degrades intelligibility. We propose a novel approach based on feature-based ranking and evaluate the effectiveness of 21 different features and their combinations for this task. On the interACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing out-of-vocabulary rate, on average by 60% and up to 84%, compared to a lecture-independent baseline. Furthermore, a 40k vocabulary selected using our method obtained better coverage than a lecture-independent 300k vocabulary, improving intelligibility and reducing the latency of the end-to-end system.
منابع مشابه
Unsupervised vocabulary selection for simultaneous lecture translation
In this work, we propose a novel method for vocabulary selection which enables simultaneous speech recognition systems for lectures to automatically adapt to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World ...
متن کاملUnsupervised Acoustic Model Training for Simultaneous Lecture Translation in Incremental and Batch Mode
In this work the theoretical concepts of unsupervised acoustic model training and the application and evaluation of unsupervised training schemes are described. Experiments aiming at speaker adaptation via unsupervised training are conducted on the KIT lecture translator system. Evaluation takes place with respect to training e ciency and overall system performance in dependency of the availabl...
متن کاملIncremental Unsupervised Training for University Lecture Recognition
In this paper we describe our work on unsupervised adaptation of the acoustic model of our simultaneous lecture translation system. We trained a speaker independent acoustic model, with which we produce automatic transcriptions of new lectures in order to improve the system for a specific lecturer. We compare our results against a model that was trained in a supervised way on an exact manual tr...
متن کاملDiscriminative MCE-based speaker adaptation of acoustic models for a spoken lecture processing task
This paper investigates the use of minimum classification error (MCE) training in conjunction with speaker adaptation for the large vocabulary speech recognition task of lecture transcription. Emphasis is placed on the case of supervised adaptation, though an examination of the unsupervised case is also conducted. This work builds upon our previous work using MCE training to construct speaker i...
متن کاملA real-world system for simultaneous translation of German lectures
We present a real-time automatic speech translation system for university lectures that can interpret several lectures in parallel. University lectures are characterized by a multitude of diverse topics and a large amount of technical terms. This poses specific challenges, e.g., a very specific vocabulary and language model are needed. In addition, in order to be able to translate simultaneousl...
متن کامل