SMT Systems in the University of Tokyo for NTCIR-9 PatentMT
نویسندگان
چکیده
Train Dev (MERT) Test EJ: # parallel sentences (Helios) 2,963,963 2,000 2,000 # En words 86,048,310 63,825 E2J: 70,624 J2E: 69,521 # Ja words (% particles) 98,923,854 72,987 E2J: 78,587 J2E: 74,070 EJ: # parallel sentences (Akamon) 2,018,214 2,000 2,000 Enju parse success rate 98.5% 98.8% 98.3% # En words 49,474,332 63,825 70,624 # Ja words (% particles) 53,271,286 73,462 73,984 CE: # parallel sentences 999,950 2,000 2,000 # Ch words 37,656,651 73,318 54,228 # En words 42,347,290 77,547 58,172
منابع مشابه
System Description of BJTU-NLP SMT for NTCIR-9 PatentMT
This paper presents the overview of statistical machine translation systems that BJTU-NLP developed for the NTCIR-9 Patent Machine Translation Task (NTCIR-9 PatentMT). We compared the performance between phrase-based translation model and factored translation model in our Patent SMT of Chinese to English and English to Japanese. Factored translation model was proposed as an extended phrase-base...
متن کاملZZX_MT: the BeiHang MT System for NTCIR-9 PatentMT Task
In this paper, we describe ZZX_MT machine translation system for the NTCIR-9 Patent Machine Translation Task(PatentMT). We participated in the Chinese-English translation subtask and submit three results, which correspond to three different models or decoding algorithms respectively. Both of the first two are phrase-based SMT approaches integrating the BTG constraint into reordering models, and...
متن کاملNTT-UT Statistical Machine Translation in NTCIR-9 PatentMT
This paper describes details of the NTT-UT system in NTCIR9 PatentMT task. One of its key technology is system combination; the final translation hypotheses are chosen from n-bests by different SMT systems in a Minimum Bayes Risk (MBR) manner. Each SMT system includes different technology: syntactic pre-ordering, forest-to-string translation, and using external resources for domain adaptation a...
متن کاملThe RWTH Aachen System for NTCIR-9 PatentMT
This paper describes the statistical machine translation (SMT) systems developed by RWTH Aachen University for the Patent Translation task of the 9th NTCIR Workshop. Both phrase-based and hierarchical SMT systems were trained for the constrained JapaneseEnglish and Chinese-English tasks. Experiments were conducted to compare different training data sets, training methods and optimization criter...
متن کاملBBN's Systems for the Chinese-English Sub-task of the NTCIR-9 PatentMT Evaluation
This paper describes the work we conducted for building a statistical machine translation (SMT) system for the ChineseEnglish sub-task of the NTCIR-9 patent machine translation (MT) evaluation [17]. We first applied the various techniques on patent data that we had developed for improving SMT performance on other types of data. Our results show that most of the techniques work on patent documen...
متن کامل