Combining Multiple Resources to Improve SMT-based Paraphrasing Model

نویسندگان

  • Shiqi Zhao
  • Cheng Niu
  • Ming Zhou
  • Ting Liu
  • Sheng Li
چکیده

This paper proposes a novel method that exploits multiple resources to improve statistical machine translation (SMT) based paraphrasing. In detail, a phrasal paraphrase table and a feature function are derived from each resource, which are then combined in a log-linear SMTmodel for sentence-level paraphrase generation. Experimental results show that the SMT-based paraphrasing model can be enhanced using multiple resources. The phrase-level and sentence-level precision of the generated paraphrases are above 60% and 55%, respectively. In addition, the contribution of each resource is evaluated, which indicates that all the exploited resources are useful for generating paraphrases of high quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation

Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation (SMT) with low resources. OOV paraphrasing that augments the translation model for the OOV words by using the translation knowledge of their paraphrases has been proposed to address the OOV problem. In this paper, we propose using word embeddings and semantic lexicons for OOV paraphrasing. Experiments conducted...

متن کامل

A Hybrid Machine Translation System Based on a Monotone Decoder

In this paper, a hybrid Machine Translation (MT) system is proposed by combining the result of a rule-based machine translation (RBMT) system with a statistical approach. The RBMT uses a set of linguistic rules for translation, which leads to better translation results in terms of word ordering and syntactic structure. On the other hand, SMT works better in lexical choice. Therefore, in our sys...

متن کامل

EBMT, SMT, hybrid and more: ATR spoken language translation system

This paper introduces ATR’s project named Corpus-Centered Computation (C3), which aims at developing a translation technology suitable for spoken language translation. C3 places corpora at the center of its technology. Translation knowledge is extracted from corpora, translation quality is gauged by referring to corpora, the best translation among multiple-engine outputs is selected based on co...

متن کامل

The Circle of Meaning: from Translation to Paraphrasing and Back

Title of dissertation: THE CIRCLE OF MEANING: FROM TRANSLATION TO PARAPHRASING AND BACK Nitin Madnani, Doctor of Philosophy, 2010 Dissertation directed by: Professor Bonnie Dorr Department of Computer Science The preservation of meaning between inputs and outputs is perhaps the most ambitious and, often, the most elusive goal of systems that attempt to process natural language. Nowhere is this ...

متن کامل

CLARIFY: Human-Powered Training of SMT Models

We present CLARIFY, an augmented environment that aims to improve the quality of translations generated by phrase-based statistical machine translation systems through learning from humans. CLARIFY employs four types of knowledge input: 1) direct input 2) results from either a word alignment game, 3) an phrase alignment game, or 4) a paraphrasing game. All of these knowledge inputs elicit user ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008