Towards a Hybrid Rule-based and Statistical Arabic-French Machine Translation System
نویسنده
چکیده
Arabic is a morphologically rich and complex language, which presents significant challenges for natural language processing and machine translation. In this paper, we describe an ongoing effort to build our first Arabic-French phrase– based machine translation system using the Moses decoder among other linguistic tools. The results show an improvement in the quality of translation and a gain in terms of Bleu score after introducing a pre-processing scheme for Arabic and applying some rules based on morphological variations of the source language. The proposed approach is completed without increasing the amount of training data or changing radically the algorithms that can affect the translation or training engines.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملImproving Translation to Morphologically Rich Languages (Améliorer la traduction des langages morphologiquement riches) [in French]
Améliorer la traduction des langages morphologiquement riches While statistical techniques for machine translation have made significant progress in the last 20 years, results for translating to morphologically rich languages are still mixed versus previous generation rule-based systems. Current research in statistical techniques for translating to morphologically rich languages varies greatly ...
متن کاملCutting the Long Tail: Hybrid Language Models for Translation Style Adaptation
In this paper, we address statistical machine translation of public conference talks. Modeling the style of this genre can be very challenging given the shortage of available in-domain training data. We investigate the use of a hybrid LM, where infrequent words are mapped into classes. Hybrid LMs are used to complement word-based LMs with statistics about the language style of the talks. Extens...
متن کاملChallenges in Building an Arabic-English GHMT System with SMT Components
The research context of this paper is developing hybrid machine translation (MT) systems that exploit the advantages of linguistic rule-based and statistical MT systems. Arabic, as a morphologically rich language, is especially challenging even without addressing the hybridization question. In this paper, we describe the challenges in building an ArabicEnglish generation-heavy machine translati...
متن کاملIntegrating a Rule-based with a Hierarchical Translation System
Recent developments on hybrid systems that combine rule-based machine translation (RBMT) systems with statistical machine translation (SMT) generally neglect the fact that RBMT systems tend to produce more syntactically well-formed translations than data-driven systems. This paper proposes a method that alleviates this issue by preserving more useful structures produced by RBMT systems and util...
متن کامل