Sub-sentencial Paraphrasing by Contextual Pivot Translation

نویسنده

  • Aurélien Max
چکیده

The ability to generate or to recognize paraphrases is key to the vast majority of NLP applications. As correctly exploiting context during translation has been shown to be successful, using context information for paraphrasing could also lead to improved performance. In this article, we adopt the pivot approach based on parallel multilingual corpora proposed by (Bannard and Callison-Burch, 2005), which finds short paraphrases by finding appropriate pivot phrases in one or several auxiliary languages and back-translating these pivot phrases into the original language. We show how context can be exploited both when attempting to find pivot phrases, and when looking for the most appropriate paraphrase in the original subsentential “envelope”. This framework allows the use of paraphrasing units ranging from words to large sub-sentential fragments for which context information from the sentence can be successfully exploited. We report experiments on a text revision task, and show that in these experiments our contextual sub-sentential paraphrasing system outperforms a strong baseline system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Statistical Machine Translation with Hybrid Phrasal Paraphrases Derived from Monolingual Text and a Shallow Lexical Resource

Paraphrase generation is useful for various NLP tasks. But pivoting techniques for paraphrasing have limited applicability due to their reliance on parallel texts, although they benefit from linguistic knowledge implicit in the sentence alignment. Distributional paraphrasing has wider applicability, but doesn’t benefit from any linguistic knowledge. We combine a distributional semantic distance...

متن کامل

Paraphrasing with Bilingual Parallel Corpora

Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in another language as a pivot. We define a para...

متن کامل

Web-based Validation for Contextual Targeted Paraphrasing

In this work, we present a scenario where contextual targeted paraphrasing of sub-sentential phrases is performed automatically to support the task of text revision. Candidate paraphrases are obtained from a preexisting repertoire and validated in the context of the original sentence using information derived from the Web. We report on experiments on French, where the original sentences to be r...

متن کامل

Leveraging Diverse Sources in Statistical Machine Translation

Statistical machine translation is often faced with the problem of having insufficient training data for many language pairs. In this thesis, several methods have been proposed to leverage other sources to enhance the quality of machine translation systems. Particularly, we propose approaches suitable in these four scenarios: 1. when an additional parallel corpus between the source and the targ...

متن کامل

Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval

Question retrieval in current community-based question answering (CQA) services does not, in general, work well for long and complex queries. One of the main difficulties lies in the word mismatch between queries and candidate questions. Existing solutions try to expand the queries at word level, but they usually fail to consider concept level enrichment. In this paper, we explore a pivot langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009