Clusters and Features from Combinatorial Stochastic Processes

نویسنده

  • Tamara Broderick
چکیده

In partitioning-­‐-­‐-­‐a.k.a. clustering-­‐-­‐-­‐data, we associate each data point with one and only one of some collection of groups called clusters or partition blocks. Here, we formally establish an analogous problem, called feature allocation, for associating data points with arbitrary non-­‐negative integer numbers of groups, now called features or topics. Just as the exchangeable partition probability function (EPPF) can be used to describe the distribution of cluster membership under an exchangeable clustering model, we examine an analogous "exchangeable feature probability function" for certain types of feature models. Moreover, recalling Kingman's paintbox theorem as a characterization of the class of exchangeable clustering models, we develop a similar "feature paintbox" characterization of the class of exchangeable feature models. We examine models such as the Bayesian nonparametric Indian buffet process as examples within this broader class.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cluster and Feature Modeling from Combinatorial Stochastic Processes

One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet proce...

متن کامل

Considering Stochastic and Combinatorial Optimization

Here, issues connected with characteristic stochastic practices are considered. In the first part, the plausibility of covering the arrangements of an improvement issue on subjective subgraphs is studied. The impulse for this strategy is a state where an advancement issue must be settled as often as possible for discretionary illustrations. Then, a preprocessing stage is considered that would q...

متن کامل

A Statistical Study of two Diffusion Processes on Torus and Their Applications

Diffusion Processes such as Brownian motions and Ornstein-Uhlenbeck processes are the classes of stochastic processes that have been investigated by researchers in various disciplines including biological sciences. It is usually assumed that the outcomes of these processes are laid on the Euclidean spaces. However, some data in physical, chemical and biological phenomena indicate that they cann...

متن کامل

Evolution of Compact - Binary Populations in Globular Clusters : A Boltzmann Study II . Introducing Stochasticity

We continue exploration of the Boltzmann scheme started in Banerjee and Ghosh (2007, henceforth Paper I) for studying the evolution of compact-binary populations of globular clusters, introducing in this paper our method of handling the stochasticity inherent in dynamical processes of binary formation, destruction and hardening in globular clusters. We describe these stochastic processes as Wie...

متن کامل

Clustering From Categorical Data Sequences

The three-parameter cluster model is a combinatorial stochastic process that generates categorical response sequences by randomly perturbing a fixed clustering parameter. This clear relationship between the observed data and the underlying clustering is particularly attractive in cluster analysis, in which supervised learning is a common goal and missing data is a familiar issue. The model is w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014