Improve SMT with Source-Side “Topic-Document” Distributions
نویسندگان
چکیده
Topic modeling is a popular framework to analyze large text collections. In the previous work, employing topic modeling into statistic machine translation mainly depends on one major topic of the test document. Different from the previous work, the proposed approaches will coverage not only major topic but also sub-topics. The basic idea of this paper is assumed that better translation quality, closer similarity of “topic-document” distributions between the target-side and the sourceside documents. We first give some initial experimental results to support this assumption. Then we transfer generating such a target document into selecting target-side sentences by an effective algorithm. A preliminary study showed that enforcing “topic-document” distributions to be consistent between target-side and source-side in SMT can potentially improve translation quality.
منابع مشابه
Topic-Based Dissimilarity and Sensitivity Models for Translation Rule Selection
Translation rule selection is a task of selecting appropriate translation rules for an ambiguous source-language segment. As translation ambiguities are pervasive in statistical machine translation, we introduce two topic-based models for translation rule selection which incorporates global topic information into translation disambiguation. We associate each synchronous translation rule with so...
متن کاملیک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملBilingual LSA-based translation lexicon adaptation for spoken language translation
We present a bilingual LSA (bLSA) framework for translation lexicon adaptation. The idea is to apply marginal adaptation on a translation lexicon so that the lexicon marginals match to indomain marginals. In the framework of speech translation, the bLSA method transfers topic distributions from the source to the target side, such that the translation lexicon can be adapted before translation ba...
متن کاملImprove SMT Quality with Automatically Extracted Paraphrase Rules
We propose a novel approach to improve SMT via paraphrase rules which are automatically extracted from the bilingual training data. Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side. Besides the word and phrase paraphrases, the acquired paraphrase rules mainly cover the str...
متن کاملDiscourse for Machine Translation
Statistical Machine Translation is a modern success: Given a source language sentence, SMT finds the most probable target language sentence, based on (1) properties of the source; (2) probabilistic source--target mappings at the level of words, phrases and/or sub-structures; and (3) properties of the target language. SMT translates individual sentences because the search space even for a single...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011