mLDM: a new hierarchical Bayesian statistical model for sparse microbial association discovery

نویسندگان

  • Yuqing Yang
  • Ning Chen
  • Ting Chen
چکیده

Interpretive analysis of metagenomic data depends on an understanding of the underlying associations among microbes from metagenomic samples. Although several statistical tools have been developed for metagenomic association studies, they suffer from compositional bias or fail to take into account environmental factors that directly affect the composition of a given microbial community. In this paper, we propose metagenomic LognormalDirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints to bypass compositional bias and discover new associations among microbes and between microbes and environmental factors. The mLDM model can 1) infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors; 2) consider both compositional bias and variance of metagenomic data; and 3) estimate absolute abundance for microbes. Thus, conditionally dependent association can capture direct relationship underlying microbial pairs and remove the indirect connections induced from other common factors. Empirical studies show the effectiveness of the mLDM model, using both synthetic data and the TARA Oceans eukaryotic data by comparing it with several state-of-the-art methodologies. Finally, mLDM is applied to western English Channel data and finds some interesting associations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse Statistical Modelling in Gene Expression Genomics

The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ide...

متن کامل

MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues

MOTIVATION Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether...

متن کامل

A Sparse Bayesian Model for Dependence Analysis of Extremes: Climate Applications

In many real applications, such as climate, finance and social media among others, we are often interested in extreme events. An important part of modeling extremes is discovery of covariates on which the quantities related to the extremes are dependent, as this may lead to improved understanding and the discovery of new causal drivers of extremes. Despite developments in sparse covariate disco...

متن کامل

Sparse Bayesian Linear Models Computational Advances and Applications in Epidemiology

Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Tomi Peltola Name of the doctoral dissertation Sparse Bayesian Linear Models: Computational Advances and Applications in Epidemiology Publisher School of Science Unit Department of Biomedical Engineering and Computational Science Series Aalto University publication series DOCTORAL DISSERTATIONS 206/2014 Field of research Compu...

متن کامل

Analysis of Hierarchical Bayesian Models for Large Space Time Data of the Housing Prices in Tehran

Housing price data is correlated to their location in different neighborhoods and their correlation is type of spatial (location). The price of housing is varius in different months, so they also have a time correlation. Spatio-temporal models are used to analyze this type of the data. An important purpose of reviewing this type of the data is to fit a suitable model for the spatial-temporal an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016