Unsupervised Learning of Paraphrases
نویسندگان
چکیده
Paraphrasing constitutes a corner stone in many Natural Language Processing fields like monolingual text-to-text generation and automatic text summarization. Indeed, aligned monolingual corpora are likely to boost the learning process of text-to-text generation models. A Paraphrase learning strategy can be defined as a two-step process: (1) identifying and extracting related sentence pairs from on-line comparable corpora (for example sentences that convey the same information but yet are written in different forms) and (2) applying learning methodologies over the extracted material to induce text-to-text rewriting rules. In this paper, we compare different lexical distance metrics for the identification of related sentences, i.e. paraphrase candidates. In particular, we discuss how different metrics lead to the identification of different types of paraphrases. Finally, the comparisons and discussions give relevant insights towards automatic generation of paraphrase corpora.
منابع مشابه
Extracting Paraphrases from a Parallel Corpus
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as sy...
متن کاملLearning Paraphrases to Improve a Question-Answering System
In this paper, we present a nearly unsupervised learning methodology for automatically extracting paraphrases from the Web. Starting with one single linguistic expression of a semantic relationship, our learning algorithm repeatedly samples the Web, in order to build a corpus of potential new examples of the same relationship. Sampling steps alternate with validation steps, during which implaus...
متن کاملUnsupervised Metaphor Paraphrasing using a Vector Space Model
We present the first fully unsupervised approach to metaphor interpretation, and a system that produces literal paraphrases for metaphorical expressions. Such a form of interpretation is directly transferable to other NLP applications that can benefit from a metaphor processing component. Our method is different from previous work in that it does not rely on any manually annotated data or lexic...
متن کاملAligning Needles in a Haystack: Paraphrase Acquisition Across the Web
This paper presents a lightweight method for unsupervised extraction of paraphrases from arbitrary textual Web documents. The method differs from previous approaches to paraphrase acquisition in that 1) it removes the assumptions on the quality of the input data, by using inherently noisy, unreliable Web documents rather than clean, trustworthy, properly formatted documents; and 2) it does not ...
متن کاملLearning Paraphrases from WNS Corpora
Paraphrase detection can be seen as the task of aligning sentences that convey the same information but yet are written in different forms. Such resources are important to automatically learn text-to-text rewriting rules. In this paper, we present a new metric for unsupervised detection of paraphrases and apply it in the context of clustering of paraphrases. An exhaustive evaluation is conducte...
متن کاملParaphrase Alignment for Synonym Evidence Discovery
We describe a new unsupervised approach for synonymy discovery by aligning paraphrases in monolingual domain corpora. For that purpose, we identify phrasal terms that convey most of the concepts within domains and adapt a methodology for the automatic extraction and alignment of paraphrases to identify paraphrase casts from which valid synonyms are discovered. Results performed on two different...
متن کامل