Modal Clustering in a Univariate Class of Product Partition Models
نویسنده
چکیده
This paper presents an algorithm for finding the maximum a posteriori (MAP) clustering in a class of univariate product partition models. While the number of possible clusterings of n observations grows according to the Bell exponential number, the dynamic programming algorithm presented here exploits properties of the model to provide an O(n2) search. Hence, the algorithm can be used to find the MAP clustering for tens of thousands of univariate data points, whereas previously it could only be approximated through a stochastic search. Integrating over the latent location variables in a Dirichlet Process mixture (DPM) model leads to a product partition model. The paper shows that several univariate, conjugate DPM mixture models satisfy the conditions for the mode-finding algorithm. The clustering algorithm is demonstrated with using data from a microarray experiment to detect differential gene expression involving 4,608 genes.
منابع مشابه
Modal Clustering in Univariate, Conjugate Dirichlet Process Mixture Models
The Dirichlet Process mixture (DPM) model is a popular nonparametric Bayesian tool for modeling unknown distributions through mixtures of components. Integrating out the latent location variables in a DPM model leads to a product partition model. This paper describes a modefinding algorithm which quickly finds either the maximizer of the partition posterior or the maximizer of the partition lik...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملModeling Stock Market Volatility Using Univariate GARCH Models: Evidence from Bangladesh
This paper investigates the nature of volatility characteristics of stock returns in the Bangladesh stock markets employing daily all share price index return data of Dhaka Stock Exchange (DSE) and Chittagong Stock Exchange (CSE) from 02 January 1993 to 27 January 2013 and 01 January 2004 to 20 August 2015 respectively. Furthermore, the study explores the adequate volatility model for the stoc...
متن کاملImproving Decision Trees by Clustering
Multi-modal classification problems arise in many fields and form an important class of problems. The presence of disjoint areas for each class creates special problems for techniques that cannot partition each class into more than one region. Among the various techniques that have been applied with some success to multi-modal problems are decision tree classifiers (DTCs) and back propagation n...
متن کاملA Predictive View of Bayesian Clustering
This work considers probability models for partitions of a set of n elements using a predictive approach, i.e., models that are specified in terms of the conditional probability of either joining an already existing cluster or forming a new one. The inherent structure can be motivated by resorting to hierarchical models of either parametric or nonparametric nature. Parametric examples include t...
متن کامل