EMDUnifrac: Exact Linear Time Computation of the Unifrac Metric and Identification of Differentially Abundant Organisms
نویسندگان
چکیده
Both the weighted and unweighted Unifrac distances have been very successfully employed to assess if two communities differ, but do not give any information about how two communities differ. We take advantage of recent observations that the Unifrac metric is equivalent to the so-called earth mover’s distance (also known as the Kantorovich-Rubinstein metric) to develop an algorithm that not only computes the Unifrac distance in linear time and space, but also simultaneously finds which operational taxonomic units are responsible for the observed differences between samples. This allows the algorithm, called EMDUnifrac, to determine why given samples are different, not just if they are different, and with no added computational burden. EMDUnifrac can be utilized on any distribution on a tree, and so is particularly suitable to analyzing both operational taxonomic units derived from amplicon sequencing, as well as community profiles resulting from classifying whole genome shotgun metagenomes. The EMDUnifrac source code (written in python) is freely available at: https://github.com/dkoslicki/EMDUnifrac.
منابع مشابه
Associating microbiome composition with environmental covariates using generalized UniFrac distances
MOTIVATION The human microbiome plays an important role in human disease and health. Identification of factors that affect the microbiome composition can provide insights into disease mechanism as well as suggest ways to modulate the microbiome composition for therapeutical purposes. Distance-based statistical tests have been applied to test the association of microbiome composition with enviro...
متن کاملThe Exact Solution of Min-Time Optimal Control Problem in Constrained LTI Systems: A State Transition Matrix Approach
In this paper, the min-time optimal control problem is mainly investigated in the linear time invariant (LTI) continuous-time control system with a constrained input. A high order dynamical LTI system is firstly considered for this purpose. Then the Pontryagin principle and some necessary optimality conditions have been simultaneously used to solve the optimal control problem. These optimality ...
متن کاملAn Optimization Model for Epidemic Mitigation and Some Theoretical and Applied Generalizations
In this paper, we present a binary-linear optimization model to prevent the spread of an infectious disease in a community. The model is based on the remotion of some connections in a contact network in order to separate infected nodes from the others. By using this model we nd an exact optimal solution and determine not only the minimum number of deleted links but also their exact positions. T...
متن کاملA Comparison Between Fourier Transform Adomian Decomposition Method and Homotopy Perturbation ethod for Linear and Non-Linear Newell-Whitehead-Segel Equations
In this paper, a comparison among the hybrid of Fourier Transform and AdomianDecomposition Method (FTADM) and Homotopy Perturbation Method (HPM) is investigated.The linear and non-linear Newell-Whitehead-Segel (NWS) equations are solved and the results arecompared with the exact solution. The comparison reveals that for the same number of componentsof recursive sequences, the error of FTADM is ...
متن کاملDiagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets
With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...
متن کامل