Spell Checking Techniques for Replacement of Unknown Words and Data Cleaning for Haitian Creole SMS Translation
نویسنده
چکیده
We report results on translation of SMS messages from Haitian Creole to English. We show improvements by applying spell checking techniques to unknown words and creating a lattice with the best known spelling equivalents. We also used a small cleaned corpus to train a cleaning model that we applied to the noisy corpora.
منابع مشابه
Noisy SMS Machine Translation in Low-Density Languages
This paper presents the system we developed for the 2011 WMT Haitian Creole–English SMS featured translation task. Applying standard statistical machine translation methods to noisy real-world SMS data in a low-density language setting such as Haitian Creole poses a unique set of challenges, which we attempt to address in this work. Along with techniques to better exploit the limited available ...
متن کاملCMU Haitian Creole-English Translation System for WMT 2011
This paper describes the statistical machine translation system submitted to the WMT11 Featured Translation Task, which involves translating Haitian Creole SMS messages into English. In our experiments we try to address the issue of noise in the training data, as well as the lack of parallel training data. Spelling normalization is applied to reduce out-of-vocabulary words in the corpus. Using ...
متن کاملThe Value of Monolingual Crowdsourcing in a Real-World Translation Scenario: Simulation using Haitian Creole Emergency SMS Messages
MonoTrans2 is a translation system that combines machine translation (MT) with human computation using two crowds of monolingual source (Haitian Creole) and target (English) speakers. We report on its use in the WMT 2011 Haitian Creole to English translation task, showing that MonoTrans2 translated 38% of the sentences well compared to Google Translate’s 25%.
متن کاملFindings of the 2011 Workshop on Statistical Machine Translation
This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgme...
متن کاملHaitian Creole: How to Build and Ship an MT Engine from Scratch
We describe the effort of the Microsoft Translator team to develop a Haitian Creole statistical machine translation engine from scratch in a matter of days. Haitian Creole presents a number of difficulties for devleoping an SMT system, principal among these is the lack of significant amounts of parallel training data and an inconsistent orthography, both of which lead to data sparseness. We dem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011