The University of maryland translation system for IWSLT 2007
نویسنده
چکیده
This paper describes the University of Maryland statistical machine translation system used in the IWSLT 2007 evaluation. Our focus was threefold: using hierarchical phrasebased models in spoken language translation, the incorporation of sub-lexical information in model estimation via morphological analysis (Arabic) and word and character segmentation (Chinese), and the use of n-gram sequence models for source-side punctuation prediction. Our efforts yield significant improvements in Chinese-English and Arabic-English translation tasks for both spoken language and human transcription conditions.
منابع مشابه
The UMD Machine Translation Systems at IWSLT 2016: English-to-French Translation of Speech Transcripts
We describe the University of Maryland machine translation system submitted to the IWSLT 2016 Microsoft Speech Language Translation (MSLT) English-French task. Our main finding is that translating conversation transcripts turned out to not be as challenging as we expected: while translation quality is of course not perfect, a straightforward phrasebased system trained on movie subtitles yields ...
متن کاملThe University of Edinburgh system description for IWSLT 2007
We present the University of Edinburgh’s submission for the IWSLT 2007 shared task. Our efforts focused on adapting our statistical machine translation system to the open data conditions for the Italian-English task of the evaluation campaign. We examine the challenges of building a system with a limited set of in-domain development data (SITAL), a small training corpus in a related but distinc...
متن کاملThe UMD Machine Translation Systems at IWSLT 2015
We describe the University of Maryland machine translation systems submitted to the IWSLT 2015 French-English and Vietnamese-English tasks. We built standard hierarchical phrase-based models, extended in two ways: (1) we applied novel data selection techniques to select relevant information from the large French-English training corpora, and (2) we experimented with neural language models. Our ...
متن کاملThe University of Washington machine translation system for the IWSLT 2007 competition
This paper presents the University of Washington’s submission to the 2007 IWSLT benchmark evaluation. The UW system participated in two data tracks, Italian-to-English and Arabic-to-English. Our main focus was on incorporating out-of-domain data, which contributed to improvements for both language pairs in both the clean text and ASR output conditions. In addition, we compared supervised and se...
متن کاملThe CASIA phrase-based statistical machine translation system for IWSLT 2007
This paper describes our phrase-based statistical machine translation system (CASIA) used in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007. In this year's evaluation, we participated in the open data track of clean text for the Chinese-to-English machine translation. Here, we mainly introduce the overview of the system, the primary modules, th...
متن کامل