A Discriminative Latent Variable-Based "DE" Classifier for Chinese-English SMT

نویسندگان

  • Jinhua Du
  • Andy Way
چکیده

Syntactic reordering on the source-side is an effective way of handling word order differences. The { (DE) construction is a flexible and ubiquitous syntactic structure in Chinese which is a major source of error in translation quality. In this paper, we propose a new classifier model — discriminative latent variable model (DPLVM) — to classify the DE construction to improve the accuracy of the classification and hence the translation quality. We also propose a new feature which can automatically learn the reordering rules to a certain extent. The experimental results show that the MT systems using the data reordered by our proposed model outperform the baseline systems by 6.42% and 3.08% relative points in terms of the BLEU score on PB-SMT and hierarchical phrase-based MT respectively. In addition, we analyse the impact of DE annotation on word alignment and on the SMT phrase table.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The HDU Discriminative SMT System for Constrained Data PatentMT at NTCIR10

We describe the statistical machine translation (SMT) systems developed at Heidelberg University for the Chinese-toEnglish and Japanese-to-English PatentMT subtasks at the NTCIR10 workshop. The core system used in both subtasks is a combination of hierarchical phrase-based translation and discriminative training using either large feature sets and `1/`2 regularization (for Japanese-to-English) ...

متن کامل

Latent Structure Discriminative Learning for Natural Language Processing

Natural language is rich with layers of implicit structure, and previous research has shown that we can take advantage of this structure to make more accurate models. Most attempts to utilize forms of implicit natural language structure for natural language processing tasks have assumed a pre-defined structural analysis before training the task-specific model. However, rather than fixing the la...

متن کامل

The Impact of Source–Side Syntactic Reordering on Hierarchical Phrase-based SMT

Syntactic reordering has been demonstrated to be helpful and effective for handling different word orders between source and target languages in SMT. However, in terms of hierarchial PB-SMT (HPB), does the syntactic reordering still has a significant impact on its performance? This paper introduces a reordering approach which explores the { (DE) grammatical structure in Chinese. We employ the S...

متن کامل

Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

We present a discriminative learning method to improve the consistency of translations in phrase-based Statistical Machine Translation (SMT) systems. Our method is inspired by Translation Memory (TM) systems which are widely used by human translators in industrial settings. We constrain the translation of an input sentence using the most similar ‘translation example’ retrieved from the TM. Diff...

متن کامل

Robust Approach to Abbreviating Terms: A Discriminative Latent Variable Model with Global Information

The present paper describes a robust approach for abbreviating terms. First, in order to incorporate non-local information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model, or alternatively, the label encoding approach with global information. Although the two approaches compete with one another, we demonstrate that these approaches ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010