QCRI-MES Submission at WMT13: Using Transliteration Mining to Improve Statistical Machine Translation

نویسندگان

  • Hassan Sajjad
  • Svetlana Smekalova
  • Nadir Durrani
  • Alexander M. Fraser
  • Helmut Schmid
چکیده

This paper describes QCRI-MES’s submission on the English-Russian dataset to the Eighth Workshop on Statistical Machine Translation. We generate improved word alignment of the training data by incorporating an unsupervised transliteration mining module to GIZA++ and build a phrase-based machine translation system. For tuning, we use a variation of PRO which provides better weights by optimizing BLEU+1 at corpus-level. We transliterate out-of-vocabulary words in a postprocessing step by using a transliteration system built on the transliteration pairs extracted using an unsupervised transliteration mining system. For the Russian to English translation direction, we apply linguistically motivated pre-processing on the Russian side of the data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Munich-Edinburgh-Stuttgart Submissions of OSM Systems at WMT13

This paper describes Munich-EdinburghStuttgart’s submissions to the Eighth Workshop on Statistical Machine Translation. We report results of the translation tasks from German, Spanish, Czech and Russian into English and from English to German, Spanish, Czech, French and Russian. The systems described in this paper use OSM (Operation Sequence Model). We explain different pre-/post-processing ste...

متن کامل

Egyptian Arabic to English Statistical Machine Translation System for NIST OpenMT'2015

The paper describes the Egyptian Arabicto-English statistical machine translation (SMT) system that the QCRI-ColumbiaNYUAD (QCN) group submitted to the NIST OpenMT’2015 competition. The competition focused on informal dialectal Arabic, as used in SMS, chat, and speech. Thus, our efforts focused on processing and standardizing Arabic, e.g., using tools such as 3arrib and MADAMIRA. We further tra...

متن کامل

Yandex School of Data Analysis Machine Translation Systems for WMT13

This paper describes the English-Russian and Russian-English statistical machine translation (SMT) systems developed at Yandex School of Data Analysis for the shared translation task of the ACL 2013 Eighth Workshop on Statistical Machine Translation. We adopted phrase-based SMT approach and evaluated a number of different techniques, including data filtering, spelling correction, alignment of l...

متن کامل

The University of Cambridge Russian-English System at WMT13

This paper describes the University of Cambridge submission to the Eighth Workshop on Statistical Machine Translation. We report results for the RussianEnglish translation task. We use multiple segmentations for the Russian input language. We employ the Hadoop framework to extract rules. The decoder is HiFST, a hierarchical phrase-based decoder implemented using weighted finitestate transducers...

متن کامل

The Application of Bayesian Alignment Techniques to Transliteration Generation and Mining

Bayesian techniques have recently been applied to many areas of natural language processing, and have proven themselves particularly useful in areas involving segmentation and alignment. This paper looks at the direct application of these techniques to the co-segmentation/alignment of grapheme sequences. We detail a novel Bayesian model for unsupervised bilingual character sequence alignment of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013