Multi Document Centroid-based Text Summarization
نویسندگان
چکیده
Text summarization is the process of taking a text document and creating a compressed version that consists of the most useful information for the user. One distinguishes between single-document summarizers (SDS) and multi-document summarizers (MDS). Multi-document summarization is much more complicated than single-document summarization. Factors that make multi-document summarization more difficult include: Multiple articles can be written by different authors, having different writing styles and document structure. Multiple articles might have contradictory views of the same event. A useful summarizer should be able to detect and handle this situation. Facts and views can change over time, documents written at different times may have conflicting information.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملLexPageRank: Prestige in Multi-Document Text Summarization
Multidocument extractive summarization relies on the concept of sentence centrality to identify the most important sentences in a document. Centrality is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We are now considering an approach for computing sentence importance based on the concept of eigenvector centrali...
متن کاملRevisiting the Centroid-based Method: A Strong Baseline for Multi-Document Summarization
The centroid-based model for extractive document summarization is a simple and fast baseline that ranks sentences based on their similarity to a centroid vector. In this paper, we apply this ranking to possible summaries instead of sentences and use a simple greedy algorithm to find the best summary. Furthermore, we show possibilities to scale up to larger input document collections by selectin...
متن کاملCentroid-based summarization of multiple documents
We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies tha...
متن کاملText Summarization Using Cuckoo Search Optimization Algorithm
Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...
متن کامل