A Phrase Combination Approach to Patent SMT

نویسندگان

  • Junguo Zhu
  • Muyun Yang
  • Tiejun Zhao
  • Sheng Li
  • Haoliang Qi
چکیده

This paper presents a phrase combination approach to patent SMT (Statistical Machine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-ofvocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which are established to utilize the dependent word level information, are combined with character translation table by linearly integrating their probability. Our experiments on NTCIR corpus indicate that the proposed method significantly out-performed the originally word based approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Alignment for Integration of SMT and RBMT Resources

A novel approach is presented for extracting syntactically motivated phrase alignments. In this method we can incorporate conventional resources such as dictionaries and grammar rules into a statistical optimization framework for phrase alignment. The method extracts bilingual phrases by incrementally merging adjacent words or phrases on both source and target language sides in accordance with ...

متن کامل

Phrase-Level Combination of SMT and TM Using Constrained Word Lattice

Constrained translation has improved statistical machine translation (SMT) by combining it with translation memory (TM) at sentence-level. In this paper, we propose using a constrained word lattice, which encodes input phrases and TM constraints together, to combine SMT and TM at phrase-level. Experiments on English– Chinese and English–French show that our approach is significantly better than...

متن کامل

The RWTH Aachen System for NTCIR-9 PatentMT

This paper describes the statistical machine translation (SMT) systems developed by RWTH Aachen University for the Patent Translation task of the 9th NTCIR Workshop. Both phrase-based and hierarchical SMT systems were trained for the constrained JapaneseEnglish and Chinese-English tasks. Experiments were conducted to compare different training data sets, training methods and optimization criter...

متن کامل

Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017

Japan Patent Information Organization (Japio) participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about...

متن کامل

Patent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017

Neural machine translation (NMT) cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. Long et al. (2017) proposed to select phrases that contain out-of-vocabulary...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008