Towards Statistical Paraphrase Generation: Preliminary Evaluations of Grammaticality
نویسندگان
چکیده
Summary sentences are often paraphrases of existing sentences. They may be made up of recycled fragments of text taken from important sentences in an input document. We investigate the use of a statistical sentence generation technique that recombines words probabilistically in order to create new sentences. Given a set of event-related sentences, we use an extended version of the Viterbi algorithm which employs dependency relation and bigram probabilities to find the most probable summary sentence. Using precision and recall metrics for verb arguments as a measure of grammaticality, we find that our system performs better than a bigram baseline, producing fewer spurious verb arguments.
منابع مشابه
Searching for Grammaticality: Propagating Dependencies in the Viterbi Algorithm
In many text-to-text generation scenarios (for instance, summarisation), we encounter humanauthored sentences that could be composed by recycling portions of related sentences to form new sentences. In this paper, we couch the generation of such sentences as a search problem. We investigate a statistical sentence generation method which recombines words to form new sentences. We propose an exte...
متن کاملA Probabilistic Model for Measuring Grammaticality and Similarity of Automatically Generated Paraphrases of Predicate Phrases
The most critical issue in generating and recognizing paraphrases is development of wide-coverage paraphrase knowledge. Previous work on paraphrase acquisition has collected lexicalized pairs of expressions; however, the results do not ensure full coverage of the various paraphrase phenomena. This paper focuses on productive paraphrases realized by general transformation patterns, and addresses...
متن کاملComparing Phrase-based and Syntax-based Paraphrase Generation
Paraphrase generation can be regarded as machine translation where source and target language are the same. We use the Moses statistical machine translation toolkit for paraphrasing, comparing phrase-based to syntax-based approaches. Data is derived from a recently released, large scale (2.1M tokens) paraphrase corpus for Dutch. Preliminary results indicate that the phrase-based approach perfor...
متن کاملUnderstanding Task Design Trade-offs in Crowdsourced Paraphrase Collection
Linguistically diverse datasets are critical for training and evaluating robust machine learning systems, but data collection is a costly process that often requires experts. Crowdsourcing the process of paraphrase generation is an effective means of expanding natural language datasets, but there has been limited analysis of the trade-offs that arise when designing tasks. In this paper, we pres...
متن کاملSentential Paraphrase Generation for Agglutinative Languages Using SVM with a String Kernel
Paraphrase generation is widely used for various natural language processing (NLP) applications such as question answering, multi-document summarization, and machine translation. In this study, we identify the problems occurring in the process of applying existing probabilistic model-based methods to agglutinative languages, and provide solutions by reflecting the inherent characteristics of ag...
متن کامل