Statistical Analysis of Metagenomic Data
نویسنده
چکیده
Metagenomics is the study of microbial communities on the genome level by direct sequencing of environmental and clinical samples. Recently developed DNA sequencing technologies have made metagenomics widely applicable and the field is growing rapidly. The statistical analysis is however challenging due to the high variability present in the data which stems from the underlying biological diversity and complexity of microbial communities. Metagenomic data is also high-dimensional and the number of replicates is typically few. Many standard methods are therefore unsuitable and there is a need for developing new statistical procedures. This thesis contains two papers. In the first paper we perform an evaluation of statistical methods for comparative metagenomics. The ability to detect differentially abundant genes and control error rates is evaluated for eleven methods previously used in metagenomics. Resampled data from a large metagenomic data set is used to provide an unbiased basis for comparisons between methods. The number of replicates, the effect size and the gene abundance are all shown to have a large impact on the performance. The statistical characteristics of the evaluated methods can serve as a guide for the statistical analysis in future metagenomic studies. The second paper describes a new statistical method for the analysis of metagenomic data. The underlying model is formulated within the framework of a hierarchical Bayesian generalized linear model. A joint prior is placed on the variance parameters and shared between all genes. We evaluate the model and show that it improves the ability to detect differentially abundant genes. This thesis underlines the importance of sound statistical analysis when the data is noisy and high-dimensional. It also demonstrates the potential of statistical modeling within metagenomics.
منابع مشابه
mLDM: a new hierarchical Bayesian statistical model for sparse microbial association discovery
Interpretive analysis of metagenomic data depends on an understanding of the underlying associations among microbes from metagenomic samples. Although several statistical tools have been developed for metagenomic association studies, they suffer from compositional bias or fail to take into account environmental factors that directly affect the composition of a given microbial community. In this...
متن کاملStatistical Approach of Functional Profiling for a Microbial Community
BACKGROUND Metagenomics is a relatively new but fast growing field within environmental biology and medical sciences. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Traditional methods in genomics and microbiology are not efficient in capturing the structure of the microbial community in an environment. Nowa...
متن کاملRandom Whole Metagenomic Sequencing for Forensic Discrimination of Soils
Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed...
متن کاملغربالگری میکروارگانیسم های جدید و ژن های مفید آنها: از روش های سنتی تا متاژنومیکس
Metagenomics is a discipline that enables the genomic study of unculturaled microorganisms. Microorganisms constitute two third of the Earth’s biological diversity. In many environments, 99% of the microorganisms cannot be cultured by standard techniques. Culture-independent methods are required to study the genetic diversity, population structure and ecological roles of the majority of o...
متن کاملExplaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting
Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that a...
متن کامل