Large-Scale Paraphrasing for Natural Language Understanding

نویسنده

  • Juri Ganitkevitch
چکیده

We examine the application of data-driven paraphrasing to natural language understanding. We leverage bilingual parallel corpora to extract a large collection of syntactic paraphrase pairs, and introduce an adaptation scheme that allows us to tackle a variety of text transformation tasks via paraphrasing. We evaluate our system on the sentence compression task. Further, we use distributional similarity measures based on context vectors derived from large monolingual corpora to annotate our paraphrases with an orthogonal source of information. This yields significant improvements in our compression system’s output quality, achieving state-of-the-art performance. Finally, we propose a refinement of our paraphrases by classifying them into natural logic entailment relations. By extending the synchronous parsing paradigm towards these entailment relations, we will enable our system to perform recognition of textual entailment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-Demand Distributional Semantic Distance and Paraphrasing

Semantic distance measures aim to answer questions such as: How close in meaning are words A and B? Fore example: "couch" and "sofa"? (very); "wave" and "ripple"? (soso); "wave" and "bank"? (far). Distributional measures do that by modeling which words occur next to A and next to B in large corpora of text, and then comparing these models of A and B (based on the "Distributional Hypothesis"). P...

متن کامل

Proceedings of the 10 th European Workshop on Natural Language Generation ( ENLG - 05 )

Probabilistic finite-state methods have been very successful for natural language processing (NLP) problems like tagging, entity identification, and transliteration. These methods have also been packaged in very useful software toolkits. However, they are not so good for attacking problems with large-scale reordering (translation, generation, paraphrasing, question answering, etc.) and sensitiv...

متن کامل

Building a Rich Large-scale Lexical Base for Generation

Most large lexical resources have been developed with language interpretation in mind and can not be used directly for generation. We present a rich large-scale lexical base for generation, constructed by merging various linguistic resources. Our approach meets the needs of language generation systems by providing the facilities for mapping from semantic concepts to verb/sense pairs, for identi...

متن کامل

Strategies for effective paraphrasing

in this paper we present a new dimension to paraphrasing text in which characteristics of the original text motivate strategies for effective pacaphrasing. Our system combines two existing robust components: the IRIJS-.II natural language underst~mding system and the SPOKESMAN generation system. We describe the architectur(: of the system and enhancements made to these components to facilitate ...

متن کامل

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

We present a method for acquiring reliable predicate-argument structures from raw corpora for automatic compilation of case frames. Such lexicon compilation requires highly reliable predicate-argument structures to practically contribute to Natural Language Processing (NLP) applications, such as paraphrasing, text entailment, and machine translation. We first apply chunking to raw corpora and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013