Automatic generation of large-scale paraphrases
نویسندگان
چکیده
Research on paraphrase has mostly focussed on lexical or syntactic variation within individual sentences. Our concern is with larger-scale paraphrases, from multiple sentences or paragraphs to entire documents. In this paper we address the problem of generating paraphrases of large chunks of texts. We ground our discussion through a worked example of extending an existing NLG system to accept as input a source text, and to generate a range of fluent semantically-equivalent alternatives, varying not only at the lexical and syntactic levels, but also in document structure and layout.
منابع مشابه
The Open University ’ s repository of research publications and other research outputs Automatic generation of large - scale paraphrases
متن کامل
Automatically Constructing a Corpus of Sentential Paraphrases
An obstacle to research in automatic paraphrase identification and generation is the lack of large-scale, publiclyavailable labeled corpora of sentential paraphrases. This paper describes the creation of the recently-released Microsoft Research Paraphrase Corpus, which contains 5801 sentence pairs, each hand-labeled with a binary judgment as to whether the pair constitutes a paraphrase. The cor...
متن کاملCreating and using large monolingual parallel corpora for sentential paraphrase generation
In this paper we investigate the automatic generation of paraphrases by using machine translation techniques. Three contributions we make are the construction of a large paraphrase corpus for English and Dutch, a re-ranking heuristic to use machine translation for paraphrase generation and a proper evaluation methodology. A large parallel corpus is constructed by aligning clustered headlines th...
متن کاملParaphrasing Headlines by Machine Translation
In this paper we investigate the automatic collection, generation and evaluation of sentential paraphrases. Valuable sources of paraphrases are news article headlines; they tend to describe the same event in various different ways, and can easily be obtained from the web. We describe a method for generating paraphrases by using a large aligned monolingual corpus of news headlines acquired autom...
متن کاملExtracting Paraphrases from a Parallel Corpus
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as sy...
متن کامل