Summarizing Topics: From Word Lists to Phrases
نویسندگان
چکیده
In this paper, we present a two-stage approach to generating descriptive phrases from the output of a statistical topic model, such as LDA [4]. First, we propose a Bayesian method for selecting statistically significant phrases from a corpus of documents, using inferred parameter values from LDA. Second, the selected phrases are combined with the topic assignments to make a list of candidate phrases for each topic. These phrases then are ranked in terms of descriptiveness using a metric based on the weighted KL divergence between topic probabilities implied by the phrase and those implied by inferred parameter values from LDA.
منابع مشابه
Evaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Topic Labels
= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against au...
متن کاملEvaluating Visual Representations for Topic Understanding and Their Effects on Manually Generated Labels
= {Probabilistic topic models are important tools for indexing, summarizing, and analyzing large document collections by their themes. However, promoting end-user understanding of topics remains an open research problem. We compare labels generated by users given four topic visualization techniquesword lists, word lists with bars, word clouds, and network graphsagainst each other and against au...
متن کاملVisualizing Topics with Multi-Word Expressions
We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant n-grams related to a topic, which are then used to help understand and interpret the underlying distribution. Compared with the usual visualization, which simply lists the most probable topical terms, th...
متن کاملApplying Word Sketches to Russian
The paper describes work on writing a Russian Sketch grammar for the system Sketch Engine. The objective of such a system is to provide lexicographers with sufficient lexical material and tools for getting information about a word’s collocability and to generate lists of the most frequent phrases for a given word, and then to classify them for appropriate syntactic models. The system will give ...
متن کاملBlogPulse: Automated Trend Discovery for Weblogs
Over the past few years, weblogs have emerged as a new communication and publication medium on the Internet. In this paper, we describe the application of data mining, information extraction and NLP algorithms for discovering trends across our subset of approximately 100,000 weblogs. We publish daily lists of key persons, key phrases, and key paragraphs to a public web site, BlogPulse.com. In a...
متن کامل