Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models

نویسندگان

  • Paul D. McNicholas
  • Thomas Brendan Murphy
  • Aaron F. McDaid
  • Dermot Frost
چکیده

Model-based clustering using a family of Gaussian mixture models, with parsimonious factor analysis-like covariance structure, is described and an efficient algorithm for its implementation is presented. This algorithm uses the alternating expectationconditional maximization (AECM) variant of the expectation-maximization (EM) algorithm. Two central issues around the implementation of this family of models, namely model selection and convergence criteria, are discussed. These central issues also have implications for other model-based clustering techniques and for the implementation of techniques like the EM algorithm, in general. The Bayesian information criterion (BIC) is used for model selection and Aitken’s acceleration, which is shown to outperform the lack of progress criterion, is used to determine convergence. A brief introduction to parallel computing is then given before the implementation of this algorithm in parallel is facilitated within the master-slave paradigm. A simulation study is then carried out to confirm the effectiveness of this parallelization. The resulting software is applied to two data sets to demonstrate its effectiveness when compared to existing software.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-based clustering of microarray expression data via latent Gaussian mixture models

MOTIVATION In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microa...

متن کامل

Dirichlet Process Parsimonious Mixtures for clustering

The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtur...

متن کامل

Parsimonious Gaussian mixture models

Parsimonious Gaussian mixture models are developed using a latent Gaussian model which is closely related to the factor analysis model. These models provide a unified modeling framework which includes the mixtures of probabilistic principal component analyzers and mixtures of factor of analyzers models as special cases. In particular, a class of eight parsimonious Gaussian mixture models which ...

متن کامل

Hypothesis Testing for Parsimonious Gaussian Mixture Models

Gaussian mixture models with eigen-decomposed covariance structures make up the most popular family of mixture models for clustering and classification, i.e., the Gaussian parsimonious clustering models (GPCM). Although the GPCM family has been used for almost 20 years, selecting the best member of the family in a given situation remains a troublesome problem. Likelihood ratio tests are develop...

متن کامل

clustvarsel: A Package Implementing Variable Selection for Model-based Clustering in R

Finite mixture modelling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, imp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computational Statistics & Data Analysis

دوره 54  شماره 

صفحات  -

تاریخ انتشار 2010