Large-Scale Paraphrasing for Natural Language Understanding
نویسنده
چکیده
We examine the application of data-driven paraphrasing to natural language understanding. We leverage bilingual parallel corpora to extract a large collection of syntactic paraphrase pairs, and introduce an adaptation scheme that allows us to tackle a variety of text transformation tasks via paraphrasing. We evaluate our system on the sentence compression task. Further, we use distributional similarity measures based on context vectors derived from large monolingual corpora to annotate our paraphrases with an orthogonal source of information. This yields significant improvements in our compression system’s output quality, achieving state-of-the-art performance. Finally, we propose a refinement of our paraphrases by classifying them into natural logic entailment relations. By extending the synchronous parsing paradigm towards these entailment relations, we will enable our system to perform recognition of textual entailment.
منابع مشابه
On-Demand Distributional Semantic Distance and Paraphrasing
Semantic distance measures aim to answer questions such as: How close in meaning are words A and B? Fore example: "couch" and "sofa"? (very); "wave" and "ripple"? (soso); "wave" and "bank"? (far). Distributional measures do that by modeling which words occur next to A and next to B in large corpora of text, and then comparing these models of A and B (based on the "Distributional Hypothesis"). P...
متن کاملProceedings of the 10 th European Workshop on Natural Language Generation ( ENLG - 05 )
Probabilistic finite-state methods have been very successful for natural language processing (NLP) problems like tagging, entity identification, and transliteration. These methods have also been packaged in very useful software toolkits. However, they are not so good for attacking problems with large-scale reordering (translation, generation, paraphrasing, question answering, etc.) and sensitiv...
متن کاملBuilding a Rich Large-scale Lexical Base for Generation
Most large lexical resources have been developed with language interpretation in mind and can not be used directly for generation. We present a rich large-scale lexical base for generation, constructed by merging various linguistic resources. Our approach meets the needs of language generation systems by providing the facilities for mapping from semantic concepts to verb/sense pairs, for identi...
متن کاملStrategies for effective paraphrasing
in this paper we present a new dimension to paraphrasing text in which characteristics of the original text motivate strategies for effective pacaphrasing. Our system combines two existing robust components: the IRIJS-.II natural language underst~mding system and the SPOKESMAN generation system. We describe the architectur(: of the system and enhancements made to these components to facilitate ...
متن کاملAcquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation
We present a method for acquiring reliable predicate-argument structures from raw corpora for automatic compilation of case frames. Such lexicon compilation requires highly reliable predicate-argument structures to practically contribute to Natural Language Processing (NLP) applications, such as paraphrasing, text entailment, and machine translation. We first apply chunking to raw corpora and t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013