Using Dependency Order Templates to Improve Generality in Translation

نویسندگان

  • Arul Menezes
  • Chris Quirk
چکیده

Today's statistical machine translation systems generalize poorly to new domains. Even small shifts can cause precipitous drops in translation quality. Phrasal systems rely heavily, for both reordering and contextual translation, on long phrases that simply fail to match outof-domain text. Hierarchical systems attempt to generalize these phrases but their learned rules are subject to severe constraints. Syntactic systems can learn lexicalized and unlexicalized rules, but the joint modeling of lexical choice and reordering can narrow the applicability of learned rules. The treelet approach models reordering separately from lexical choice, using a discriminatively trained order model, which allows treelets to apply broadly, and has shown better generalization to new domains, but suffers a factorially large search space. We introduce a new reordering model based on dependency order templates, and show that it outperforms both phrasal and treelet systems on in-domain and out-of-domain text, while limiting the search space.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploratory-cumulative vs. Disputational Talk on Cognitive Dependency of Translation Studies: Intermediate level students in focus

The present study set out to determine the effect of implementing exploratory-cumulative talk in comparison to disputational talk on cognitive (meaning development and organization of thought as well as problem solving ability) dependency of intermediate level students in translation studies. In order to achieve the objectives of the study, a quasi-experimental-pretest-posttest-statistical stud...

متن کامل

Dependency Treelet Translation: Syntactically Informed Phrasal SMT

We describe a novel approach to statistical machine translation that combines syntactic information in the source language with recent advances in phrasal translation. This method requires a source-language dependency parser, target language word segmentation and an unsupervised word alignment component. We align a parallel corpus, project the source dependency parse onto the target sentence, e...

متن کامل

Learning Translation Templates for Closely Related Languages

Many researchers have worked on example-based machine translation and different techniques have been investigated in the area. In literature, a method of using translation templates learned from bilingual example pairs was proposed. The paper investigates the possibility of applying the same idea for close languages where word order is preserved. In addition to applying the original algorithm f...

متن کامل

Improving Fluency by Reordering Target Constituents Using MST Parser in English-to-Japanese Phrase-based SMT

We propose a reordering method to improve the fluency of the output of the phrase-based SMT (PBSMT) system. We parse the translation results that follow the source language order into non-projective dependency trees, then reorder dependency trees to obtain fluent target sentences. Our method ensures that the translation results are grammatically correct and achieves major improvements over PBSM...

متن کامل

TASSER_low-zsc: an approach to improve structure prediction using low z-score-ranked templates.

In a variety of threading methods, often poorly ranked (low z-score) templates have good alignments. Here, a new method, TASSER_low-zsc that identifies these low z-score-ranked templates to improve protein structure prediction accuracy, is described. The approach consists of clustering of threading templates by affinity propagation on the basis of structural similarity (thread_cluster) followed...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007