Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression
نویسندگان
چکیده
منابع مشابه
Phrasal Rank-Encoding: Exploiting Phrase Redundancy and Translational Relations for Phrase Table Compression
Wedescribe Phrasal Rank-Encoding (PR-Enc), a novel method for the compression of wordaligned target language data in phrase tables as used in phrase-based SMT. This method reduces the redundancy in phrase tables which is a direct effect of the phrase-based approach. A combination of PR-Enc with Huffman coding allows to reduce the size of an aggressively compressed phrase table by another 39 per...
متن کاملA Phrase Table without Phrases: Rank Encoding for Better Phrase Table Compression
This paper describes the first steps towards a minimum-size phrase table implementation to be used for phrase-based statistical machine translation. The focus lies on the size reduction of target language data in a phrase table. Rank Encoding (REnc), a novel method for the compression of word-aligned target language in phrase tables is presented. Combined with Huffman coding a relative size red...
متن کاملHierarchical Phrase Table Combination for Machine Translation
Typical statistical machine translation systems are batch trained with a given training data and their performances are largely influenced by the amount of data. With the growth of the available data across different domains, it is computationally demanding to perform batch training every time when new data comes. In face of the problem, we propose an efficient phrase table combination method. ...
متن کاملTmTriangulate: A Tool for Phrase Table Triangulation
This work was supported by the grants no 645452 (QT21) and no 644402 (HimL) of the EU and SVV 260 104 of the Czech Republic. We used language resources hosted by the LINDAT/CLARIN project LM2010013 of the Ministry of Education, Youth and Sports. Introduction Under-resourced language pair: Scarcity of parallel corpora SMT Problem: No direct data → no SMT training Insufficient data → poor SMT per...
متن کاملPhrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?
In this work, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall. We present a generic phrase training algorithm which is parameterized with feature functions and can be optimized jointly with the translation engine to directly maximize the end-to-end system performance. Mu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Prague Bulletin of Mathematical Linguistics
سال: 2012
ISSN: 1804-0462,0032-6585
DOI: 10.2478/v10108-012-0009-6