REMOOV: A Tool for Online Handling of Out-of-Vocabulary Words in Machine Translation
نویسنده
چکیده
REMOOV is a tool for online handling of out-of-vocabulary (OOV) words in statistical machine translation. REMOOV employs four techniques. Spelling expansion and morphological expansion are used to produce alternative in-vocabulary (INV) forms of OOV words. Dictionary term expansion and proper name transliteration produce target translations directly. These techniques can be used to expand the phrase table utilized in decoding or as part of an input/output lattice expansion. Results of using REMOOV show a consistent improvement over a state-of-the-art baseline. This paper describes the different components and parameters of the REMOOV tool.
منابع مشابه
Four Techniques for Online Handling of Out-of-Vocabulary Words in Arabic-English Statistical Machine Translation
We present four techniques for online handling of Out-of-Vocabulary words in Phrasebased Statistical Machine Translation. The techniques use spelling expansion, morphological expansion, dictionary term expansion and proper name transliteration to reuse or extend a phrase table. We compare the performance of these techniques and combine them. Our results show a consistent improvement over a stat...
متن کاملExploiting Parallel Corpus for Handling Out-of-Vocabulary Words
This paper presents a hybrid model for handling out-of-vocabulary words in Japaneseto-English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we treat these scripts differently. A machine transliteration model is built to transliterate out-ofvocabulary Japanese k...
متن کاملHandling of Out-of-vocabulary Words in Japanese-English Machine Translation by Exploiting Parallel Corpus
A large number of loanwords and orthographic variants in Japanese pose a challenge for machine translation. In this article, we present a hybrid model for handling out-of-vocabulary words in Japanese-to-English statistical machine translation output by exploiting parallel corpus. As the Japanese writing system makes use of four different script sets (kanji, hiragana, katakana, and romaji), we t...
متن کاملImages as Context in Statistical Machine Translation∗
This paper reports ongoing experiments towards exploiting the use of images to provide additional context for statistical machine translation (SMT). We investigate whether this contextual information can be helpful in targeting two well-known challenges in machine translation: ambiguity (incorrect translation of words that have multiple senses) and out-of-vocabulary words (words left untranslat...
متن کاملHandling OOV Words in Dialectal Arabic to English Machine Translation
Dialects and standard forms of a language typically share a set of cognates that could bear the same meaning in both varieties or only be shared homographs but serve as faux amis. Moreover, there are words that are used exclusively in the dialect or the standard variety. Both phenomena, faux amis and exclusive vocabulary, are considered out of vocabulary (OOV) phenomena. In this paper, we prese...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009