Biology Based Alignments of Paraphrases for Sentence Compression
نویسندگان
چکیده
1 In this paper, we present a study for extracting and aligning paraphrases in the context of Sentence Compression. First, we justify the application of a new measure for the automatic extraction of paraphrase corpora. Second, we discuss the work done by (Barzilay & Lee, 2003) who use clustering of paraphrases to induce rewriting rules. We will see, through classical visualization methodologies (Kruskal & Wish, 1977) and exhaustive experiments, that clustering may not be the best approach for automatic pattern identification. Finally, we will provide some results of different biology based methodologies for pairwise paraphrase alignment.
منابع مشابه
External Plagiarism Detection based on Human Behaviors in Producing Paraphrases of Sentences in English and Persian Languages
With the advent of the internet and easy access to digital libraries, plagiarism has become a major issue. Applying search engines is one of the plagiarism detection techniques that converts plagiarism patterns to search queries. Generating suitable queries is the heart of this technique and existing methods suffer from lack of producing accurate queries, Precision and Speed of retrieved result...
متن کاملCreating Disjunctive Logical Forms from Aligned Sentences for Grammar-Based Paraphrase Generation
We present a method of creating disjunctive logical forms (DLFs) from aligned sentences for grammar-based paraphrase generation using the OpenCCG broad coverage surface realizer. The method takes as input word-level alignments of two sentences that are paraphrases and projects these alignments onto the logical forms that result from automatically parsing these sentences. The projected alignment...
متن کاملParaphrastic Sentence Compression with a Character-based Metric: Tightening without Deletion
We present a substitution-only approach to sentence compression which “tightens” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60% of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual corpora. At high compression rates1 paraphrasti...
متن کاملRule Induction for Sentence Reduction
The field of Automatic Sentence Reduction has been an active research topic, with several relevant approaches being recently proposed. However, in our view many milestones still need to be reached in order to approach human-like quality sentence simplification. In this work, we propose a new framework, which processes huge sets of web news stories and learns sentence reduction rules in a fully ...
متن کاملLearning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its a...
متن کامل