Phrase Segmentation Model using Collocation and Translational Entropy
نویسندگان
چکیده
In this paper, we propose a phrase segmentation model for the phrase-based statistical machine translation. We observed that good translation candidates generated by a conventional phrase-based SMT decoder have lexical cohesion and show more uniform translation for each phrase segment. Based on the observation, we propose a novel phrase segmentation model using collocation between two adjacent words and translation entropy of phrase segments. Experimental results show that the proposed model significantly improves the translation quality in both English-to-Korean and English-to-Chinese translation tasks.
منابع مشابه
Using collocation segmentation to extract translation units in a phrase-based statistical machine translation system
This report evaluates the impact of using a novel collocation segmentation method for phrase extraction in the standard phrase-based statistical machine translation approach. The collocation segmentation technique is implemented simultaneously in the source and target side. The resulting collocation segmentation is used to extract translation units. Experiments are reported in the Spanish-toEng...
متن کاملUsing collocation segmentation to extract translation units in a phrase-based statistical machine translation system Implementación de una segmentación estad́ıstica complementaria para extraer unidades de traducción en un sistema de traducción estad́ıstico basado en frases
This report evaluates the impact of using a novel collocation segmentation method for phrase extraction in the standard phrase-based statistical machine translation approach. The collocation segmentation technique is implemented simultaneously in the source and target side. The resulting collocation segmentation is used to extract translation units. Experiments are reported in the Spanish-toEng...
متن کاملUsing Collocation Segmentation to Augment the Phrase Table
This paper describes the 2010 phrase-based statistical machine translation system developed at the TALP Research Center of the UPC in cooperation with BMIC and VMU. In phrase-based SMT, the phrase table is the main tool in translation. It is created extracting phrases from an aligned parallel corpus and then computing translation model scores with them. Performing a collocation segmentation ove...
متن کاملIntegration of statistical collocation segmentations in a phrase-based statistical machine translation system
This study evaluates the impact of integrating two different collocation segmentations methods in a standard phrase-based statistical machine translation approach. The collocation segmentation techniques are implemented simultaneously in the source and target side. Each resulting collocation segmentation is used to extract translation units. Experiments are reported in the English-to-Spanish Bi...
متن کاملTCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models
This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (...
متن کامل