Dialectal to Standard Arabic Paraphrasing to Improve Arabic-English Statistical Machine Translation
نویسندگان
چکیده
This paper is about improving the quality of Arabic-English statistical machine translation (SMT) on dialectal Arabic text using morphological knowledge. We present a light-weight rule-based approach to producing Modern Standard Arabic (MSA) paraphrases of dialectal Arabic out-of-vocabulary (OOV) words and low frequency words. Our approach extends an existing MSA analyzer with a small number of morphological clitics, and uses transfer rules to generate paraphrase lattices that are input to a state-of-the-art phrasebased SMT system. This approach improves BLEU scores on a blind test set by 0.56 absolute BLEU (or 1.5% relative). A manual error analysis of translated dialectal words shows that our system produces correct translations in 74% of the time for OOVs and 60% of the time for low frequency words.
منابع مشابه
Handling OOV Words in Dialectal Arabic to English Machine Translation
Dialects and standard forms of a language typically share a set of cognates that could bear the same meaning in both varieties or only be shared homographs but serve as faux amis. Moreover, there are words that are used exclusively in the dialect or the standard variety. Both phenomena, faux amis and exclusive vocabulary, are considered out of vocabulary (OOV) phenomena. In this paper, we prese...
متن کاملMulti-Lingual Phrase-Based Statistical Machine Translation for Arabic-English
In this paper, we implement a multilingual Statistical Machine Translation (SMT) system for Arabic-English Translation. Arabic Text can be categorized into standard and dialectal Arabic. These two forms of Arabic differ significantly. Different mono-lingual and multi-lingual hybrid SMT approaches are compared. Mono-lingual systems do always result in better translation accuracy in one Arabic fo...
متن کاملTranslating Dialectal Arabic to English
We present a dialectal Egyptian Arabic to English statistical machine translation system that leverages dialectal to Modern Standard Arabic (MSA) adaptation. In contrast to previous work, we first narrow down the gap between Egyptian and MSA by applying an automatic characterlevel transformational model that changes Egyptian to EG′, which looks similar to MSA. The transformations include morpho...
متن کاملExploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation
Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language. In this study we develop techniques for extracting parallel data for one particular dialect of Arabic (Iraqi Arabic) from out-ofdomain corpora in different dialects of Arabic or in Modern Standard Arabic. We compare two dif...
متن کاملUnsupervised Word Segmentation Improves Dialectal Arabic to English Machine Translation
We demonstrate the feasibility of using unsupervised morphological segmentation for dialects of Arabic, which are poor in linguistics resources. Our experiments using a Qatari Arabic to English machine translation system show that unsupervised segmentation helps to improve the translation quality as compared to using no segmentation or to using ATB segmentation, which was especially designed fo...
متن کامل