Language Model Adaptation for Statistical Machine Translation Based on Information Retrieval

نویسندگان

  • Matthias Eck
  • Stephan Vogel
  • Alexander H. Waibel
چکیده

Language modeling is an important part for both speech recognition and machine translation systems. Adaptation has been successfully applied to language models for speech recognition. In this paper we present experiments concerning language model adaptation for statistical machine translation. We develop a method to adapt language models using information retrieval methods. The adapted language models drastically reduce perplexity over a general language model and we can show that it is possible to improve the translation quality of a statistical machine translation using those adapted language models instead of a general language model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

NTT statistical machine translation system for IWSLT 2010

In this year’s IWSLT evaluation campaign (TALK task), we applied three adaptation techniques: (1) training data selection based on information retrieval approach, (2) subsentence segmentation, and (3) language model adaptation using source-side of the test set. We also applied a sequential labeling method based on conditional random fields for restoring punctuation markers in the ASR input cond...

متن کامل

Language Model Adaptation for Automatic Speech Recognition and Statistical Machine Translation

Language modeling is critical and indispensable for many natural language applications such as automatic speech recognition and machine translation. Due to the complexity of natural language grammars, it is almost impossible to construct language models by a set of linguistic rules; therefore statistical techniques have been dominant for language modeling over the last few decades. All statisti...

متن کامل

Domain Adaptation of Statistical Machine Translation Models with Monolingual Data for Cross Lingual Information Retrieval

Statistical Machine Translation (SMT) is often used as a black-box in CLIR tasks. We propose an adaptation method for an SMT model relying on the monolingual statistics that can be extracted from the document collection (both source and target if available). We evaluate our approach on CLEF Domain Specific task (German-English and English-German) and show that very simple document collection st...

متن کامل

Translation-based ranking in cross-language information retrieval

Today’s amount of user-generated, multilingual textual data generates the necessity for information processing systems, where cross-linguality, i.e the ability to work on more than one language, is fully integrated into the underlying models. In the particular context of Information Retrieval (IR), this amounts to rank and retrieve relevant documents from a large repository in language A, given...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004