The nested indian buffet process for flexible topic modeling
نویسندگان
چکیده
This paper presents a flexible topic model based on the nested Indian buffet process (nIBP). The flexibility is achieved by relaxing three constraints: (1) number of topics is fixed, (2) topics are independent, and (3) topic hierarchy for a document is limited by a single tree path. Bayesian nonparametric learning is conducted to build a tree model where the number of topics and the topic hierarchies are automatically learnt from the given data. In particular, we propose the nIBP to construct the topic mixture model for representation of heterogeneous documents where the mixture components are flexibly selected from tree nodes or dishes that a document or customer chooses in Indian buffet process. The selection is performed in a nested and hierarchical manner. The experiments on document representation show the benefits of using the proposed nIBP.
منابع مشابه
Slice sampling in nested IBP
We develop a nonparametric Bayesian method that explores the infinite space of latent features and finds the best subset in the sense of posterior probability. When the data appear in several groups, there should be different measures reflecting the differences between the groups. We formalize this as a nested Indian buffet process (nIBP) by assuming different measures according to the specific...
متن کاملPosterior Contraction Rates of the Phylogenetic Indian Buffet Processes.
By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The...
متن کاملFocused Topic Models
We present the focused topic model (FTM), a family of nonparametric Bayesian models for learning sparse topic mixture patterns. The FTM integrates desirable features from both the hierarchical Dirichlet process (HDP) and the Indian buffet process (IBP) – allowing an unbounded number of topics for the entire corpus, while each document maintains a sparse distribution over these topics. We observ...
متن کاملRestricted Indian buffet processes
Latent feature models are a powerful tool for modeling data with globally-shared features. Nonparametric exchangeable models such as the Indian Buffet Process offer modeling flexibility by letting the number of latent features be unbounded. However, current models impose implicit distributions over the number of latent features per data point, and these implicit distributions may not match our ...
متن کاملSpectral Methods for Indian Buffet Process Inference
The Indian Buffet Process is a versatile statistical tool for modeling distributions over binary matrices. We provide an efficient spectral algorithm as an alternative to costly Variational Bayes and sampling-based algorithms. We derive a novel tensorial characterization of the moments of the Indian Buffet Process proper and for two of its applications. We give a computationally efficient itera...
متن کامل