Modified Bootstrapping and K-Means Clustering for Taxonomic Binning

نویسنده

  • Anna Olson
چکیده

Metagenomics is the study of microbial ecology using genetics as an access point. We seek to understand the microbial communities in environments such as tidal pools, soil, mine runoff, or even the human gut, so that we can understand the impact that microbes have on our world and our health. Metagenomic analysis usually involves the determination of what species are present in a given sample, and, if possible, what each species’ genome looks like. Because the vast majority of microbial organisms cannot be cultured easily in a generic lab setting, we look instead to the DNA taken from a sample, and try to assign the DNA to different species (the taxonomic binning problem). Many methods for solving the taxonomic binning problem exist, but they are extremely computationally intensive, reliant on woefully incomplete reference databases, or insufficiently accurate. In this paper, we put forward a new taxonomic binning algorithm called Horatio. Horatio is a modified bootstrapping and k-means clustering algorithm for taxonomic binning that uses compositional information about DNA subsequences. Horatio requires relatively little setup, and has tunable parameters that can reflect users’ preferences. We compare and contrast Horatio to other extant taxonomic binning algorithms and evaluate its performance against simulated and real metagenomic datasets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge

Motivation The advent of next-generation sequencing technologies enables researchers to sequence complex microbial communities directly from the environment. Because assembly typically produces only genome fragments, also known as contigs, instead of an entire genome, it is crucial to group them into operational taxonomic units (OTUs) for further taxonomic profiling and down-streaming functiona...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Normalized mutual information based registration using k-means clustering and shading correction

In this paper the influence of intensity clustering and shading correction on mutual information based image registration is studied. Instead of the generally used equidistant re-binning, we use k-means clustering in order to achieve a more natural binning of the intensity distribution. Secondly, image inhomogeneities occurring notably in MR images can have adverse effects on the registration. ...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014