Efficient Statistical Machine Translation Algorithm based on IBM Model 4

نویسندگان

  • Kai Li
  • Yuejie Zhang
چکیده

This paper describes our methodologies for NTCIR-7 Patent Translation Task, and reports the official results based on English and Japanese corpus. Our system was a novel combination pattern of machine translation algorithms including classical statistical method -IBM model and highly efficient decoding algorithm. The result of this new method is relatively decent, and its speed is also fast. It can be considered as a candidate for such situations as people who want to get a kind of quick and simple grasp of the main idea of a text.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Simultaneous Word-Morpheme Alignment for Statistical Machine Translation

Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a two-level alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-...

متن کامل

11-928 Master’s Thesis Symmetric Probabilistic Alignment

The CMU Example-Based Machine Translation (EBMT) system has been deployed successfully in many projects for years. But even though a good alignment algorithm is essential since the CMU EBMT system uses parallel corpora, it has relatively less studied than other components of EBMT. For this reason, we developed a new alignment algorithm which uses statistical information drawn from parallel corp...

متن کامل

A Convex Alternative to IBM Model 2

The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multi...

متن کامل

Revisiting Optimal Decoding for Machine Translation IBM Model 4

This paper revisits optimal decoding for statistical machine translation using IBM Model 4. We show that exact/optimal inference using Integer Linear Programming is more practical than previously suggested when used in conjunction with the Cutting-Plane Algorithm. In our experiments we see that exact inference can provide a gain of up to one BLEU point for sentences of length up to 30 tokens.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008