Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation

نویسندگان

  • Tsuyoshi Okita
  • Yvette Graham
  • Andy Way
چکیده

Word alignment is to estimate a lexical translation probability p(e|f), or to estimate the correspondence g(e, f) where a function g outputs either 0 or 1, between a source word f and a target word e for given bilingual sentences. In practice, this formulation does not consider the existence of ‘noise’ (or outlier) which may cause problems depending on the corpus. N -to-m mapping objects, such as paraphrases, non-literal translations, and multiword expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Statistical Machine Translation with Character Alignment

The dominant practice of statistical machine translation (SMT) uses the same Chinese word segmentation specification in both alignment and translation rule induction steps in building Chinese-English SMT system, which may suffer from a suboptimal problem that word segmentation better for alignment is not necessarily better for translation. To tackle this, we propose a framework that uses two di...

متن کامل

Using Complexity to Simplify Knowledge Translation; Comment on “Using Complexity and Network Concepts to Inform Healthcare Knowledge Translation”

Putting health theories, research and knowledge into practice is a challenge referred to as the knowledge-toaction gap. Knowledge translation (KT), and its related concepts of knowledge mobilization, implementation science and research impact, emerged to mitigate this gap. While the social interaction view of KT has gained currency, scholars have not easily made a link between KT and the concep...

متن کامل

Alignment-Based Neural Machine Translation

Neural machine translation (NMT) has emerged recently as a promising statistical machine translation approach. In NMT, neural networks (NN) are directly used to produce translations, without relying on a pre-existing translation framework. In this work, we take a step towards bridging the gap between conventional word alignment models and NMT. We follow the hidden Markov model (HMM) approach th...

متن کامل

Comparative Study of Word Alignment Heuristics and Phrase-Based SMT

This paper comparatively analyzes six different word alignment heuristics and their impacts on translation quality. We also propose a method to filter the noise in the phrase tables extracted by these heuristic methods and examine the effectiveness of combination of the methods. Experiments are performed on the Europarl corpus, where a multilingual in-domain training corpus, an in-domain test s...

متن کامل

From Knowing to Doing—From the Academy to Practice; Comment on “The Many Meanings of Evidence: Implications for the Translational Science Agenda in Healthcare”

In this commentary, the idea of closing the gap between knowing and doing through closing the gap between academics and practitioners is explored. The two communities approach to knowledge production and use, has predominated within healthcare, resulting in a separation between the worlds of research and practice, and, therefore, between its producers and users. Meaningful collaborations betwee...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010