Realignment from Finer-grained Alignment to Coarser-grained Alignment to Enhance Mongolian-Chinese SMT
نویسندگان
چکیده
The conventional Mongolian-Chinese statistical machine translation (SMT) model uses Mongolian words and Chinese words to practice the system. However, data sparsity, complex Mongolian morphology and Chinese word segmentation (CWS) errors lead to alignment errors and ambiguities. Some other works use finer-grained Mongolian stems and Chinese characters, which suffer from information loss when inducting translation rules. To tackle this, we proposed a method of using finer-grained Mongolian stems and Chinese characters for word alignment, but coarser-grained Mongolian words and Chinese words for translation rule induction (TRI) and decoding. We presented a heuristic technique to transform Chinese character-based alignment to word-based alignment. Experimentally, our method outperformed the baselines: fully finergrained and fully coarser-grained, in terms of alignment quality and translation performance.
منابع مشابه
Using Punctuations and Lengths for Bilingual Sub-sentential Alignment
We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.
متن کاملInterleaving Text and Punctuations for Bilingual Sub-sentential Alignment
We present a new approach to aligning bilingual English and Chinese text at sub-sentential level by interleaving alphabetic texts and punctuations matches. With sub-sentential alignment, we expect to improve the effectiveness of alignment at word, chunk and phrase levels and provide finer grained and more reusable translation memory.
متن کاملOn the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT
We present analyses showing that HMEANT is a reliable, accurate and fine-grained semantic frame based human MT evaluation metric with high inter-annotator agreement (IAA) and correlation with human adequacy judgments, despite only requiring a minimal training of about 15 minutes for lay annotators. Previous work shows that the IAA on the semantic role labeling (SRL) subtask within HMEANT is ove...
متن کاملRecrystallization texture during ECAP processing of ultrafine/nano grained magnesium alloy
An ultrafine/nano grained AZ31 magnesium alloy was produced through four-pass ECAP processing. TEM microscopy indicated that recrystallized regions included nano grains of 75 nm. Pole figures showed that a fiber basal texture with two-pole peaks was developed after four passes, where a basal pole peak lies parallel to the extrusion direction (ED) and the other ~20° away from the transverse dire...
متن کاملEnhancing Statistical Machine Translation with Character Alignment
The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two di...
متن کامل