A More Cohesive Summarizer
نویسندگان
چکیده
We have developed a cohesive extraction based single document summarizer (COHSUM) based on coreference links in a document. The sentences providing the most references to other sentences and that other sentences are referring to, are considered the most important and are therefore extracted. Additionally, before evaluations of summary quality, a corpus analysis was performed on the original documents in the dataset in order to investigate the distribution of coreferences. The quality of the summaries is evaluated in terms of content coverage and cohesion. Content coverage is measured by comparing the summaries to manually created gold standards and cohesion is measured by calculating the amount of broken and intact coreferences in the summary compared to the original texts. The summarizer is compared to the summarizers from DUC 2002 and a baseline consisting of the first 100 words. The results show that COHSUM, aimed only at maintaining a cohesive text, performed better regarding text cohesion compared to the other summarizers and on par with the other summarizers and the baseline regarding content coverage.
منابع مشابه
The effects of analysing cohesion on document summarisation
We argue that in general, the analysis of lexical cohesion factors in a document can drive a summarizer, as well as enable other content characterization tasks. More narrowly, this paper focuses on how one particular cohesion factor—simple lexical repetition—can enhance an existing sentence extraction summarizer, by enabling strategies for overcoming some particularly jarring enduser effects in...
متن کاملLexical cohesion, discourse segmentation and document summarization
Summaries automatically derived by sentence extraction are known to exhibit some coherence degradation, readability deterioration, and topical under-representation. We propose a strategy for improving upon these problems, aiming to generate more cohesive summaries by analyzing the lexical cohesion factors in the source document texts. As an initial experiment, we have looked at one particular f...
متن کاملCohesion and coherence for Automatic Summarization
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...
متن کاملIntegrating cohesion and coherence for Automatic Summarization
This paper presents the integration of cohesive properties of text with coherence relations, to obtain an adequate representation of text for automatic summarization. A summarizer based on Lexical Chains is enchanced with rhetorical and argumentative structure obtained via Discourse Markers. When evaluated with newspaper corpus, this integration yields only slight improvement in the resulting s...
متن کاملEnhancing extraction based summarization with outside word space
We present results from improving vector space based extraction summarizers. The summarizer uses Random Indexing and Page Rank to extract those sentences whose importance are ranked highest for a document, based on vector similarity. Originally the summarizer used only word vectors based on the words in the document to be summarized. By using a larger word space model the performance of the sum...
متن کامل