A Phrase Combination Approach to Patent SMT
نویسندگان
چکیده
This paper presents a phrase combination approach to patent SMT (Statistical Machine Translation) for Japanese to English. To minimize the segmentation problems caused by the rich OOV (out-ofvocabulary) words in the patent texts, the character based translation phrases are first introduced to avoid the segmentation errors in translation modeling. Then the word based translation phrases, which are established to utilize the dependent word level information, are combined with character translation table by linearly integrating their probability. Our experiments on NTCIR corpus indicate that the proposed method significantly out-performed the originally word based approach.
منابع مشابه
Phrase Alignment for Integration of SMT and RBMT Resources
A novel approach is presented for extracting syntactically motivated phrase alignments. In this method we can incorporate conventional resources such as dictionaries and grammar rules into a statistical optimization framework for phrase alignment. The method extracts bilingual phrases by incrementally merging adjacent words or phrases on both source and target language sides in accordance with ...
متن کاملPhrase-Level Combination of SMT and TM Using Constrained Word Lattice
Constrained translation has improved statistical machine translation (SMT) by combining it with translation memory (TM) at sentence-level. In this paper, we propose using a constrained word lattice, which encodes input phrases and TM constraints together, to combine SMT and TM at phrase-level. Experiments on English– Chinese and English–French show that our approach is significantly better than...
متن کاملThe RWTH Aachen System for NTCIR-9 PatentMT
This paper describes the statistical machine translation (SMT) systems developed by RWTH Aachen University for the Patent Translation task of the 9th NTCIR Workshop. Both phrase-based and hierarchical SMT systems were trained for the constrained JapaneseEnglish and Chinese-English tasks. Experiments were conducted to compare different training data sets, training methods and optimization criter...
متن کاملComparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017
Japan Patent Information Organization (Japio) participates in patent subtasks (JPC-EJ/JE/CJ/KJ) with phrase-based statistical machine translation (SMT) and neural machine translation (NMT) systems which are trained with its own patent corpora in addition to the subtask corpora provided by organizers of WAT2017. In EJ and CJ subtasks, SMT and NMT systems whose sizes of training corpora are about...
متن کاملPatent NMT integrated with Large Vocabulary Phrase Translation by SMT at WAT 2017
Neural machine translation (NMT) cannot handle a larger vocabulary because the training complexity and decoding complexity proportionally increase with the number of target words. This problem becomes even more serious when translating patent documents, which contain many technical terms that are observed infrequently. Long et al. (2017) proposed to select phrases that contain out-of-vocabulary...
متن کامل