topic model

The Polylingual Labeled Topic Model

2015

Lisa Posch Arnim Bleier Philipp Schaer Markus Strohmaier

In this paper, we present the Polylingual Labeled Topic Model, a model which combines the characteristics of the existing Polylingual Topic Model and Labeled LDA. The model accounts for multiple languages with separate topic distributions for each language while restricting the permitted topics of a document to a set of predefined labels. We explore the properties of the model in a two-language...

متن کامل

A Neural Autoregressive Topic Model

2012

Hugo Larochelle Stanislas Lauly

We describe a new model for learning meaningful representations of text documents from an unlabeled collection of documents. This model is inspired by the recently proposed Replicated Softmax, an undirected graphical model of word counts that was shown to learn a better generative model and more meaningful document representations. Specifically, we take inspiration from the conditional mean-fie...

متن کامل

Concept-Based Topic Model Improvement

2011

Claudiu Cristian Musat Julien Velcin Marian-Andrei Rizoiu Stefan Trausan-Matu

We propose a system which employs conceptual knowledge to improve topic models by removing unrelated words from the simplified topic description. We use WordNet to detect which topical words are not conceptually similar to the others and then test our assumptions against human judgment. Results obtained on two different corpora in different test conditions show that the words detected as unrela...

متن کامل

Topic Compositional Neural Language Model

Journal: :CoRR 2017

Wenlin Wang Zhe Gan Wenqi Wang Dinghan Shen Jiaji Huang Wei Ping Sanjeev Satheesh Lawrence Carin

We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local wordordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-ofExperts (MoE) language mo...

متن کامل

Bigram Anchor Words Topic Model

2016

Arseniy Ashuha Natalia V. Loukachevitch

A probabilistic topic model is a modern statistical tool for document collection analysis that allows extracting a number of topics in the collection and describes each document as a discrete probability distribution over topics. Classical approaches to statistical topic modeling can be quite effective in various tasks, but the generated topics may be too similar to each other or poorly interpr...

متن کامل

Bayesian latent topic clustering model

2008

Meng-Sung Wu Jen-Tzung Chien

Document modeling is important for document retrieval and categorization. The probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) are popular paradigms of document models where word/document correlations are inferred by latent topics. In PLSA and LDA, the unseen words and documents are not explicitly represented at the same time. Model generalization is constrain...

متن کامل

Factorized Multi-Modal Topic Model

2012

Seppo Virtanen Yangqing Jia Arto Klami Trevor Darrell

Multi-modal data collections, such as corpora of paired images and text snippets, require analysis methods beyond single-view component and topic models. For continuous observations the current dominant approach is based on extensions of canonical correlation analysis, factorizing the variation into components shared by the different modalities and those private to each of them. For count data,...

متن کامل

The Inverse Regression Topic Model

2014

Maxim Rabinovich David M. Blei

Taddy (2013) proposed multinomial inverse regression (MNIR) as a new model of annotated text based on the influence of metadata and response variables on the distribution of words in a document. While effective, MNIR has no way to exploit structure in the corpus to improve its predictions or facilitate exploratory data analysis. On the other hand, traditional probabilistic topic models (like la...

متن کامل

On-Line Labeled Topic Model

2016

YongHeng Chen Yaojin Lin Hao Yue

A large number of electronic documents are labeled using human-interpretable annotations. High-efficiency text mining on such data set requires generative model that can flexibly comprehend the significance of observed labels while simultaneously uncovering topics within unlabeled documents. This paper presents a novel and generalized on-line labeled topic model (OLT) tracking the time developm...

متن کامل

Topic Tracking with Dynamic Topic Model and Topic-based Weighting Method

Journal: :JSW 2010

Xiaoyan Zhang Ting Wang

In topic tracking, a topic is usually described by several stories. How to represent a topic is always an issue and a difficult problem in the research on topic tracking. To emphasis the topic in stories, we provide an improved topicbased tf*idf weighting method to measure the topical importance of the features in the representation model. To overcome the topic drift problem and filter the nois...

متن کامل