Morpho-syntactic Arabic Preprocessing for Arabic to English Statistical Machine Translation
نویسندگان
چکیده
The Arabic language has far richer systems of inflection and derivation than English which has very little morphology. This morphology difference causes a large gap between the vocabulary sizes in any given parallel training corpus. Segmentation of inflected Arabic words is a way to smooth its highly morphological nature. In this paper, we describe some statistically and linguistically motivated methods for Arabic word segmentation. Then, we show the efficiency of proposed methods on the Arabic-English BTEC and NIST tasks.
منابع مشابه
Integrating morpho-syntactic features in English-Arabic statistical machine translation
This paper presents a hybrid approach to the enhancement of English to Arabic statistical machine translation quality. Machine Translation has been defined as the process that utilizes computer software to translate text from one natural language to another. Arabic, as a morphologically rich language, is a highly flexional language, in that the same root can lead to various forms according to i...
متن کاملThe MIRACL Arabic-English Statistical Machine Translation
This paper describes the MIRACL statistical Machine Translation system and the improvements that were developed during the IWSLT 2010 evaluation campaign. We participated to the Arabic to English BTEC tasks using a phrase-based statistical machine translation approach. In this paper, we first discuss some challenges in translating from Arabic to English and we explore various techniques to impr...
متن کاملSyntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabi...
متن کاملImproving Arabic-Chinese Statistical Machine Translation using English as Pivot Language
We present a comparison of two approaches for Arabic-Chinese machine translation using English as a pivot language: sentence pivoting and phrase-table pivoting. Our results show that using English as a pivot in either approach outperforms direct translation from Arabic to Chinese. Our best result is the phrase-pivot system which scores higher than direct translation by 1.1 BLEU points. An error...
متن کاملSyntactic Preprocessing for Statistical Machine Translation
We describe an approach to automatic source-language syntactic preprocessing in the context of Arabic-English phrase-based machine translation. Source-language labeled dependencies, that are word aligned with target language words in a parallel corpus, are used to automatically extract syntactic reordering rules in the same spirit of Xia and McCord (2004) and Zhang et al. (2007). The extracted ...
متن کامل