Maximum Likelihood of Evolutionary Trees Is Hard
نویسندگان
چکیده
Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees (Felsenstein, 1981). Finding optimal ML trees appears to be a very hard computational task, but for tractable cases, ML is the method of choice. In particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for the second major character based criterion, maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years (Day, Johnson and Sankoff, 1986), reduction from vertex cover), such a hardness result for ML has so far eluded researchers in the field. An important work by Tuffley and Steel (1997) proves quantitative relations between the parsimony values of given sequences and the corresponding log likelihood values. However, a direct application of it would only give an exponential time reduction from MP to ML. Another step in this direction has recently been made by Addario-Berry et al. (2004), who proved that ancestral maximum likelihood (AML) is NP-complete. AML “lies in between” the two problems, having some properties of MP and some properties of ML. Still, the AML proof is not directly applicable to the ML problem. We resolve the question, showing that “regular” ML on phylogenetic trees is indeed intractable. Our reduction follows those for MP and AML, but its starting point is an approximation version of vertex cover, known as gap vc. The crux of our work is not the reduction, but its correctness proof. The proof goes through a series of tree modifications, while controlling the likelihood losses at each step, using the bounds of Tuffley and Steel. The proof can be viewed as correlating the value of any ML solution to an arbitrarily close approximation to vertex cover.
منابع مشابه
Maximum likelihood of evolutionary trees: hardness and approximation
MOTIVATION Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees. Yet the computational complexity of ML was open for over 20 years, and only recently resolved by the authors for the Jukes-Cantor model of substitution and its generalizations. It was proved that reconstructing the ML tree is computationally intractable (NP-hard). In this work we...
متن کاملAncestral Maximum Likelihood of Evolutionary Trees Is Hard
Maximum likelihood (ML) (Neyman, 1971) is an increasingly popular optimality criterion for selecting evolutionary trees. Finding optimal ML trees appears to be a very hard computational task--in particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for maximum parsimony (MP). However, while MP has been known to be NP-complete for over 20 years, no such h...
متن کاملAnalytic solutions of maximum likelihood on forks of four taxa.
This work deals with symbolic mathematical solutions to maximum likelihood on small phylogenetic trees. Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. In this work, we give general analytic solutions for a family of trees with four taxa, two state characters, under a molecular...
متن کاملOperations Research Ph . D . Final Exam
Phylogenetics is the study of evolutionary relations between different organisms. Phylogenetic trees are the representations of these relations. Researchers have been working on finding fast and systematic approaches to reconstruct phylogenetic trees from observed data for over 40 years. It has been shown that, given a certain criterion to evaluate each tree, finding the best fitted phylogeneti...
متن کاملFinding the Maximum Likelihood Tree is Hard
Maximum likelihood (ML) is an increasingly popular optimality criterion for selecting evolutionary trees (Felsenstein, 1981). Finding optimal ML trees appears to be a very hard computational task, but for tractable cases, ML is the method of choice. In particular, algorithms and heuristics for ML take longer to run than algorithms and heuristics for the second major character based criterion, m...
متن کامل