Investigating the potential of post-ordering SMT output to improve translation quality
نویسندگان
چکیده
Post-ordering of Statistical Machine Translation (SMT) output to correct word order errors could be a promising area of research to overcome structural divergence between language pairs. This is especially true when it is difficult to incorporate rich linguistic features into the baseline decoder. In this paper, we propose an algorithm for generating oracle reorderings of MT output. We use the oracle reorderings to empirically quantify an upper bound on improvement in translation quality through post-ordering techniques. In our study encompassing multiple language pairs, we show that significant improvement in translation quality can be obtained by applying reordering transformations on the output of the SMT system. This presents a strong case for investing effort in exploring the post-ordering problem.
منابع مشابه
A Hybrid Machine Translation System Based on a Monotone Decoder
In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...
متن کاملCollection of a Large Database of French-English SMT Output Corrections
Corpus-based approaches to machine translation (MT) rely on the availability of parallel corpora. To produce user-acceptable translation outputs, such systems need high quality data to be efficiently trained, optimized and evaluated. However, building high quality dataset is a relatively expensive task. In this paper, we describe the data collection and analysis of a large database of 10.881 SM...
متن کاملImproving the Post-Editing Experience using Translation Recommendation: A User Study
We report findings from a user study with professional post-editors using a translation recommendation framework (He et al., 2010) to integrate Statistical Machine Translation (SMT) output with Translation Memory (TM) systems. The framework recommends SMT outputs to a TM user when it predicts that SMT outputs are more suitable for postediting than the hits provided by the TM. We analyze the eff...
متن کاملCombining pre-editing and post-editing to improve SMT of user- generated content
The poor quality of user-generated content (UGC) found in forums hinders both readability and machine-translatability. To improve these two aspects, we have developed humanand machine-oriented pre-editing rules, which correct or reformulate this content. In this paper we present the results of a study which investigates whether pre-editing rules that improve the quality of statistical machine t...
متن کاملUSAAR: An Operation Sequential Model for Automatic Statistical Post-Editing
This paper presents an automatic postediting (APE) method to improve the translation quality produced by an English–German (EN–DE) statistical machine translation (SMT) system. Our system is based on Operation Sequential Model (OSM) combined with phrasedbased statistical MT (PB-SMT) system. The system is trained on monolingual settings between MT outputs (TLMT ) produced by a black-box MT syste...
متن کامل