Multi-Candidate Reduction for Flexible Single-Document Summarization
ثبت نشده
چکیده
Sentence compression techniques based on linguistically-motivated syntactic rules have proved effective in single-document summarization tasks. The addition of topic terms yields state-of-the-art performance, according to previous evaluations. Since “trimming” rules must be applied successively, optimal rule ordering presents a challenge. This paper describes the Multi-Candidate Reduction (MCR) framework, which addresses this issue by simultaneously generating many compressed variants from multiple starting points. A weighted featurebased technique, optimized using previous test data, is then used to select from the generated candidates. The MCR framework enhances the flexibility of summarization systems and improves output quality: we obtain scores that are significantly higher for some ROUGE measures than the highest scores previously reported.
منابع مشابه
Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
This article examines the application of two single-document sentence compression techniques to the problem of multi-document summarization—a “parse-and-trim” approach and a statistical noisy-channel approach. We introduce the Multi-Candidate Reduction (MCR) framework for multi-document summarization, in which many compressed candidates are generated for each source sentence. These candidates a...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملSentence Reduction Algorithms to Improve Multi-document Summarization
Multi-document summarization aims to create a single summary based on the information conveyed by a collection of texts. After the candidate sentences have been identified and ordered, it is time to select which will be included in the summary. In this paper, we describe an approach that uses sentence reduction, both lexical and syntactic, to help improve the compression step in the summarizati...
متن کاملMultilingual Summarization: Dimensionality Reduction and a Step Towards Optimal Term Coverage
In this paper we present three term weighting approaches for multi-lingual document summarization and give results on the DUC 2002 data as well as on the 2013 Multilingual Wikipedia feature articles data set. We introduce a new intervalbounded nonnegative matrix factorization. We use this new method, latent semantic analysis (LSA), and latent Dirichlet allocation (LDA) to give three term-weight...
متن کاملMulti-Document Summarization via Discriminative Summary Reranking
Existing multi-document summarization systems usually rely on a specific summarization model (i.e., a summarization method with a specific parameter setting) to extract summaries for different document sets with different topics. However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and e...
متن کامل