Translation Model Adaptation Using Genre-Revealing Text Features

نویسندگان

  • Marlies van der Wees
  • Arianna Bisazza
  • Christof Monz
چکیده

Research in domain adaptation for statistical machine translation (SMT) has resulted in various approaches that adapt system components to specific translation tasks. The concept of a domain, however, is not precisely defined, and most approaches rely on provenance information or manual subcorpus labels, while genre differences have not been addressed explicitly. Motivated by the large translation quality gap that is commonly observed between different genres in a test corpus, we explore the use of document-level genrerevealing text features for the task of translation model adaptation. Results show that automatic indicators of genre can replace manual subcorpus labels, yielding significant improvements across two test sets of up to 0.9 BLEU. In addition, we find that our genre-adapted translation models encourage document-level translation consistency.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generic Analysis of Literary Translation: A Case Study of Contemporary English Short Stories

Translation of a literary text is a difficult task, for understanding literature requires knowledge of various linguistic levels of a literary text in addition to strategies and methods of translation. To this should still be added cognitive-based translation training which helps practitioners preserve the aesthetic aspects of a literary text. Focusing on short story as a genre with both ...

متن کامل

The Effect of Genre Awareness on English Translation Quality and Pedagogy: A Case of News Reports Translation as an Academic Curriculum

To produce an adequate translation, language students are required to learn varieties of language features including syntax, semantics and pragmatics. Considering the curriculum language learners are face with, one can claim that almost all language students in Iran are taught these features in their academic settings including linguistic courses. Yet, there are some aspects of language which a...

متن کامل

A Summary Writing Model Based on Van Dijk’s Concept of Macrostructure and its Application within the Genre-Based Approach

This study was an attempt to provide a comprehensive model for summary writing based on the model of Van Dijk’s concept of macrostructures. The effectiveness of the model was examined in a genre-based quasi-experimental study with the data collection procedure lasting a semester. The participants included 60 female English learners divided into two experimental and control groups. The results o...

متن کامل

Biber Redux: Reconsidering Dimensions of Variation in American English

Genre classification has been found to improve performance in many applications of statistical NLP, including language modeling for spoken language, domain adaptation of statistical parsers, and machine translation. It has also been found to benefit retrieval of spoken or written documents. At its base, however, classification assumes separability. This paper revisits an assumption that genre v...

متن کامل

Selection-Based Language Model for Domain Adaptation using Topic Modeling

This paper introduces a selection-based LM using topic modeling for the purpose of domain adaptation which is often required in Statistical Machine Translation. The performance of this selection-based LM slightly outperforms the state-of-theart Moore-Lewis LM by 1.0% for EN-ES and 0.7% for ES-EN in terms of BLEU. The performance gain in terms of perplexity was 8% over the Moore-Lewis LM and 17%...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015