Graph-based models for multi-document summarization
نویسنده
چکیده
University of Ljubljana Faculty of Computer and Information Science Ercan Canhasi Graph-based models for multi-document summarization is thesis is about automatic document summarization, with experimental results on general, query, update and comparative multi-document summarization (MDS). We describe prior work and our own improvements on some important aspects of a summarization system, including text modeling by means of a graph and sentence selection via archetypal analysis. e centerpiece of this work is a novel method for summarization that we call ”Archetypal Analysis Summarization”. Archetypal Analysis (AA) is a promising unsupervised learning tool able to completely assemble the advantages of clustering and the flexibility of matrix factorization. We propose a novel AA based summarization method based on following observations. In generic document summarization, given a graph representation of a set of documents, positively and/or negatively salient sentences are values on the data set boundary. To compute these extreme values, general or weighted archetypes, we choose to use archetypal analysis and weighted archetypal analysis, respectively. While each sentence in a data set is estimated as a mixture of archetypal sentences, the archetypes themselves are restricted to being sparse mixtures, i.e. convex combinations of the original sentences. Since AA in this way readily offers soft clustering and probabilistic ranking, we suggest considering it as a method for simultaneous sentence clustering and ranking. Another important argument in favor of using AA in MDS is that in contrast to other factorization methods which extract prototypical, characteristic, even basic sentences, AA selects distinct (archetypal) sentences, thus induces variability and diversity in produced summaries. Our research contributes by presenting some new modeling approaches based on graph notation which facilitate the text summarization task. We investigate the impact of using the content-graph and multi-element graph model for languageand domain-independent extractive multi-document generic and query focused summarization. We also propose the novel version of AA, the weighted Hier-
منابع مشابه
Multi-layered graph-based multi-document summarization model
Multi-document summarization is a process of automatic generation of a compressed version of the given collection of documents. Recently, the graph-based models and ranking algorithms have been actively investigated by the extractive document summarization community. While most work to date focuses on homogeneous connecteness of sentences and heterogeneous connecteness of documents and sentence...
متن کاملTimestamped Graphs: Evolutionary Models of Text for Multi-Document Summarization
Current graph-based approaches to automatic text summarization, such as LexRank and TextRank, assume a static graph which does not model how the input texts emerge. A suitable evolutionary text graph model may impart a better understanding of the texts and improve the summarization process. We propose a timestamped graph (TSG) model that is motivated by human writing and reading processes, and ...
متن کاملGeneric Multi-Document Summarization Using Topic-Oriented Information
The graph-based ranking models have been widely used for multi-document summarization recently. By utilizing the correlations between sentences, the salient sentences can be extracted according to the ranking scores. However, sentences are treated in a uniform way without considering the topic-level information in traditional methods. This paper proposes the topic-oriented PageRank (ToPageRank)...
متن کاملAn Exploration of Document Impact on Graph-Based Multi-Document Summarization
The graph-based ranking algorithm has been recently exploited for multi-document summarization by making only use of the sentence-to-sentence relationships in the documents, under the assumption that all the sentences are indistinguishable. However, given a document set to be summarized, different documents are usually not equally important, and moreover, different sentences in a specific docum...
متن کاملQuery-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning
Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014