Model-based clustering of microarray expression data via latent Gaussian mixture models

نویسندگان

  • Paul D. McNicholas
  • Thomas Brendan Murphy
چکیده

MOTIVATION In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-Based Clustering for Expression Data via a Dirichlet Process Mixture Model

This chapter describes a clustering procedure for microarray expression data based on a well-defined statistical model, specifically, a conjugate Dirichlet process mixture model. The clustering algorithm groups genes whose latent variables governing expression are equal, that is, genes belonging to the same mixture component. The model is fit with Markov chain Monte Carlo and the computational ...

متن کامل

Clustering time-course Microarray data using functional Bayesian infinite mixture model

This paper presents a new Bayesian, infinite mixture model based, clustering approach specifically designed for time-course microarray data. The problem is to group together genes which have “similar” expression profiles given the set of noisy measurements of their expression levels over a specific time interval. In order to capture temporal variations of each curve, a nonparametric regression ...

متن کامل

Nonparametric Bayesian Clustering via Infinite Warped Mixture Models

We introduce a flexible class of mixture models for clustering and density estimation. Our model allows clustering of non-linearly-separable data, produces a potentially low-dimensional latent representation, automatically infers the number of clusters, and produces a density estimate. Our approach makes use of two tools from Bayesian nonparametrics: a Dirichlet process mixture model to allow a...

متن کامل

Variable selection in clustering via Dirichlet process mixture models

The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a model-based method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure...

متن کامل

Visualisation of Reduced-Dimension Microarray Data Using Gaussian Mixture Models

Dimensionality reduction, clustering and visualisation methods proposed in recent years have afforded new possibilities for the analysis of gene expression data. However, efficient, novel techniques for processing and representing microarray data are still required. We propose the use of the discrete cosine and sine transformations for dimensionality reduction of microarray data. These techniqu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 26 21  شماره 

صفحات  -

تاریخ انتشار 2010