Using the Expectation Maximization Algorithm with Heterogeneous Mixture Components for the Analysis of Spectrometry Data

نویسندگان

  • Dominik Kopczynski
  • Sven Rahmann
چکیده

Coupling a multi-capillary column (MCC) with an ion mobility (IM) spectrometer (IMS) opened a multitude of new application areas for gas analysis, especially in a medical context, as volatile organic compounds (VOCs) in exhaled breath can hint at a person’s state of health. To obtain a potential diagnosis from a raw MCC/IMS measurement, several computational steps are necessary, which so far have required manual interaction, e.g., human evaluation of discovered peaks. We have recently proposed an automated pipeline for this task that does not require human intervention during the analysis. Nevertheless, there is a need for improved methods for each computational step. In comparison to gas chromatography / mass spectrometry (GC/MS) data, MCC/IMS data is easier and less expensive to obtain, but peaks are more diffuse and there is a higher noise level. MCC/IMS measurements can be described as samples of mixture models (i.e., of convex combinations) of two-dimensional probability distributions. So we use the expectation-maximization (EM) algorithm to deconvolute mixtures in order to develop methods that improve data processing in three computational steps: denoising, baseline correction and peak clustering. A common theme of these methods is that mixture components within one model are not homogeneous (e.g., all Gaussian), but of different types. Evaluation shows that the novel methods outperform the existing ones. We provide Python software implementing all three methods and make our evaluation data available at http://www.rahmannlab.de/research/ims.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Land Cover Classification for Polarimetric SAR Images Based on Mixture Models

In this paper, two mixture models are proposed for modeling heterogeneous regions in single-look and multi-look polarimetric SAR images, along with their corresponding maximum likelihood classifiers for land cover classification. The classical Gaussian and Wishart models are suitable for modeling scattering vectors and covariance matrices from homogeneous regions, while their performance deteri...

متن کامل

Unsupervised learning of regression mixture models with unknown number of components

Regression mixture models are widely studied in statistics, machine learning and data analysis. Fitting regression mixtures is challenging and is usually performed by maximum likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the initialization is crucial for EM. If the initialization is inappropriately performed, the EM algorithm may lead to unsatis...

متن کامل

A Mixture Model of Two Different Distributions Approach to the Analysis of Heterogeneous Survival Data

In this paper we propose a mixture of two different distributions such as Exponential-Gamma, Exponential-Weibull and Gamma-Weibull to model heterogeneous survival data. Various properties of the proposed mixture of two different distributions are discussed. Maximum likelihood estimations of the parameters are obtained by using the EM algorithm. Illustrative example based on real data are also g...

متن کامل

Analysis of LC-MS Data Using Probabilistic-Based Mixture Regression Models Analyse von LC-MS-Daten mit wahrscheinlichkeitsbasierter Mischung von Regressionsmodellen

A novel framework of a probabilistic-based mixture regression model (PMRM) is presented for alignment of multiple liquid chromatography-mass spectrometry (LC-MS) data with respect to retention time (RT) and mass-to-charge ratio (m/z). The expectation maximization algorithm is used to estimate the joint parameters of spline-based mixture regression models and prior transformation density models....

متن کامل

Clustering with Beta Divergences

Clustering algorithms start with a fixed divergence, which captures the possibly asymmetric distance between a sample and a centroid. In the mixture model setting, the sample distribution plays the same role. When all attributes have the same topology and dispersion, the data are said to be homogeneous. If the prior knowledge of the distribution is inaccurate or the set of plausible distributio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1405.5501  شماره 

صفحات  -

تاریخ انتشار 2014