Simultaneous Word-Morpheme Alignment for Statistical Machine Translation

نویسندگان

  • Elif Eyigöz
  • Daniel Gildea
  • Kemal Oflazer
چکیده

Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

This paper describes an unsupervised dynamic graphical model for morphological segmentation and bilingual morpheme alignment for statistical machine translation. The model extends Hidden Semi-Markov chain models by using factored output nodes and special structures for its conditional probability distributions. It relies on morpho-syntactic and lexical source-side information (part-of-speech, m...

متن کامل

Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation

We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with “sentences” comprising only the content words...

متن کامل

Statistical Machine Translation into a Morphologically Complex Language

In this paper, we present the results of our investigation into phrase-based statistical machine translation from English into Turkish – an agglutinative language with very productive inflectional and derivational word-formation processes. We investigate different representational granularities for morphological structure and find that (i) representing both Turkish and English at the morpheme-l...

متن کامل

A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level ...

متن کامل

Multi-Rate HMMs for Word Alignment

We apply multi-rate HMMs, a tree structured HMM model, to the word-alignment problem. Multi-rate HMMs allow us to model reordering at both the morpheme level and the word level in a hierarchical fashion. This approach leads to better machine translation results than a morphemeaware model that does not explicitly model morpheme reordering.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013