mLDM: a new hierarchical Bayesian statistical model for sparse microbial association discovery
نویسندگان
چکیده
Interpretive analysis of metagenomic data depends on an understanding of the underlying associations among microbes from metagenomic samples. Although several statistical tools have been developed for metagenomic association studies, they suffer from compositional bias or fail to take into account environmental factors that directly affect the composition of a given microbial community. In this paper, we propose metagenomic LognormalDirichlet-Multinomial (mLDM), a hierarchical Bayesian model with sparsity constraints to bypass compositional bias and discover new associations among microbes and between microbes and environmental factors. The mLDM model can 1) infer both conditionally dependent associations among microbes and direct associations between microbes and environmental factors; 2) consider both compositional bias and variance of metagenomic data; and 3) estimate absolute abundance for microbes. Thus, conditionally dependent association can capture direct relationship underlying microbial pairs and remove the indirect connections induced from other common factors. Empirical studies show the effectiveness of the mLDM model, using both synthetic data and the TARA Oceans eukaryotic data by comparing it with several state-of-the-art methodologies. Finally, mLDM is applied to western English Channel data and finds some interesting associations.
منابع مشابه
Sparse Statistical Modelling in Gene Expression Genomics
The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ide...
متن کاملMT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues
MOTIVATION Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ': hotspots ': , important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether...
متن کاملA Sparse Bayesian Model for Dependence Analysis of Extremes: Climate Applications
In many real applications, such as climate, finance and social media among others, we are often interested in extreme events. An important part of modeling extremes is discovery of covariates on which the quantities related to the extremes are dependent, as this may lead to improved understanding and the discovery of new causal drivers of extremes. Despite developments in sparse covariate disco...
متن کاملSparse Bayesian Linear Models Computational Advances and Applications in Epidemiology
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi Author Tomi Peltola Name of the doctoral dissertation Sparse Bayesian Linear Models: Computational Advances and Applications in Epidemiology Publisher School of Science Unit Department of Biomedical Engineering and Computational Science Series Aalto University publication series DOCTORAL DISSERTATIONS 206/2014 Field of research Compu...
متن کاملAnalysis of Hierarchical Bayesian Models for Large Space Time Data of the Housing Prices in Tehran
Housing price data is correlated to their location in different neighborhoods and their correlation is type of spatial (location). The price of housing is varius in different months, so they also have a time correlation. Spatio-temporal models are used to analyze this type of the data. An important purpose of reviewing this type of the data is to fit a suitable model for the spatial-temporal an...
متن کامل