Focused Topic Models
نویسندگان
چکیده
We present the focused topic model (FTM), a family of nonparametric Bayesian models for learning sparse topic mixture patterns. The FTM integrates desirable features from both the hierarchical Dirichlet process (HDP) and the Indian buffet process (IBP) – allowing an unbounded number of topics for the entire corpus, while each document maintains a sparse distribution over these topics. We observe that the HDP assumes correlation between the global and within-documant prevalences of a topic, and note that such a relationship may be undesirable. By using an IBP to select which topics contribute to a document, and an unnormalized Dirichlet Process to determine how much of the document is generated by that topic, the FTM decouples these probabilities, allowing for more flexible modeling. Experimental results on three text corpora demonstrate superior performance over the hierarchical Dirichlet process topic model.
منابع مشابه
A review of text mining approaches and their function in discovering and extracting a topic
Background and aim: Four text mining methods are examined and focused on understanding and identifying their properties and limitations in subject discovery. Methodology: The study is an analytical review of the literature of text mining and topic modeling. Findings: LSA could be used to classify specific and unique topics in documents that address only a single topic. The other three text min...
متن کاملGibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors
Previous work on probabilistic topic models has either focused on models with relatively simple conjugate priors that support Gibbs sampling or models with non-conjugate priors that typically require variational inference. Gibbs sampling is more accurate than variational inference and better supports the construction of composite models. We present a method for Gibbs sampling in non-conjugate l...
متن کاملInvestigating Retrieval Performance with Manually-Built Topic Models
Modeling text with topics is currently a popular research area in both Machine Learning and Information Retrieval (IR). Most of this research has focused on automatic methods though there are many hand-crafted topic resources available online. In this paper we investigate retrieval performance with topic models constructed manually based on a hand-crafted directory resource. The original query ...
متن کاملMachine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality
Topic models based on latent Dirichlet allocation and related methods are used in a range of user-focused tasks including document navigation and trend analysis, but evaluation of the intrinsic quality of the topic model and topics remains an open research area. In this work, we explore the two tasks of automatic evaluation of single topics and automatic evaluation of whole topic models, and pr...
متن کاملPresenter: HMW Category: graphical models Preference: Oral Polylingual Topic Models
Statistical topic models are a useful tool for analyzing large, unstructured document collections [1, 2]. Such collections are increasingly available in multiple languages. Previous work on bilingual topic modeling [4] has focused on aligning pairs of translated sentences. In contrast, we consider “loosely parallel” corpora, in which tuples of documents in different languages are not direct tra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009