Japanese-to-English Patent Translation System based on Domain-adapted Word Segmentation and Post-ordering

نویسندگان

  • Katsuhito Sudoh
  • Masaaki Nagata
  • Shinsuke Mori
  • Tatsuya Kawahara
چکیده

This paper presents a Japanese-to-English statistical machine translation system specialized for patent translation. Patents are practically useful technical documents, but their translation needs different efforts from general-purpose translation. There are two important problems in the Japanese-to-English patent translation: long distance reordering and lexical translation of many domain-specific terms. We integrated novel lexical translation of domain-specific terms with a syntax-based post-ordering framework that divides the machine translation problem into lexical translation and reordering explicitly for efficient syntax-based translation. The proposed lexical translation consists of a domain-adapted word segmentation and an unknown word transliteration. Experimental results show our system achieves better translation accuracy in BLEU and TER compared to the baseline methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Japanese-to-English Statistical Machine Translation System for Technical Documents

This thesis addresses a Japanese-to-English statistical machine translation (SMT) system for technical documents. Machine translation (MT) is a promising solution for growing translation needs. Japanese-to-English MT is one of the most difficult language pairs due to their large lexical and syntactic differences. This thesis work focuses on patents as the most demanded technical documents that ...

متن کامل

Post-ordering in Statistical Machine Translation

In the field of staistical machine translation (SMT), pre-ordering is a recently attractive approach that reorders source language words into the target language order prior to SMT decoding. It is effective for long-distance reordering in SMT, especially between languages with distant word ordering like English and Japanese. Its key idea is to decompose the SMT problem into two subproblems of t...

متن کامل

The TRGTK's System Description of the PatentMT Task at the NTCIR-10 Workshop

This paper introduces the TRGTK’s system for Patent Machine Translation at the NTCIR-10 Workshop. In this year’s program, we participate Chinese-English, English-Japanese and Japanese-English three subtasks. We submit required system results for Intrinsic Evaluation (IE), Patent Examination Evaluation (PEE), Chronological Evaluation (ChE), and Multilingual Evaluation (ME). Different from last y...

متن کامل

Learning of Linear Ordering Problems and its Application to J-E Patent Translation in NTCIR-9 PatentMT

This paper describes the patent translation system submitted for the NTCIR-9 PatentMT task. We applied the Linear Ordering Problem (LOP) based reordering model [16] to Japanese-to-English translation to deal with the substantial difference in the word order between the two languages.

متن کامل

A Phrase Combination Approach to Patent SMT

This paper presents a phrase combination approach to patent SMT (Statistical Machine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-ofvocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014