A rapid bootstrap algorithm for the RAxML Web servers.
نویسندگان
چکیده
Despite recent advances achieved by application of high-performance computing methods and novel algorithmic techniques to maximum likelihood (ML)-based inference programs, the major computational bottleneck still consists in the computation of bootstrap support values. Conducting a probably insufficient number of 100 bootstrap (BS) analyses with current ML programs on large datasets-either with respect to the number of taxa or base pairs-can easily require a month of run time. Therefore, we have developed, implemented, and thoroughly tested rapid bootstrap heuristics in RAxML (Randomized Axelerated Maximum Likelihood) that are more than an order of magnitude faster than current algorithms. These new heuristics can contribute to resolving the computational bottleneck and improve current methodology in phylogenetic analyses. Computational experiments to assess the performance and relative accuracy of these heuristics were conducted on 22 diverse DNA and AA (amino acid), single gene as well as multigene, real-world alignments containing 125 up to 7764 sequences. The standard BS (SBS) and rapid BS (RBS) values drawn on the best-scoring ML tree are highly correlated and show almost identical average support values. The weighted RF (Robinson-Foulds) distance between SBS- and RBS-based consensus trees was smaller than 6% in all cases (average 4%). More importantly, RBS inferences are between 8 and 20 times faster (average 14.73) than SBS analyses with RAxML and between 18 and 495 times faster than BS analyses with competing programs, such as PHYML or GARLI. Moreover, this performance improvement increases with alignment size. Finally, we have set up two freely accessible Web servers for this significantly improved version of RAxML that provide access to the 200-CPU cluster of the Vital-IT unit at the Swiss Institute of Bioinformatics and the 128-CPU cluster of the CIPRES project at the San Diego Supercomputer Center. These Web servers offer the possibility to conduct large-scale phylogenetic inferences to a large part of the community that does not have access to, or the expertise to use, high-performance computing resources.
منابع مشابه
Load Balancing Approaches for Web Servers: A Survey of Recent Trends
Numerous works has been done for load balancing of web servers in grid environment. Reason behinds popularity of grid environment is to allow accessing distributed resources which are located at remote locations. For effective utilization, load must be balanced among all resources. Importance of load balancing is discussed by distinguishing the system between without load balancing and with loa...
متن کاملUltrafast Approximation for Phylogenetic Bootstrap
Nonparametric bootstrap has been a widely used tool in phylogenetic analysis to assess the clade support of phylogenetic trees. However, with the rapidly growing amount of data, this task remains a computational bottleneck. Recently, approximation methods such as the RAxML rapid bootstrap (RBS) and the Shimodaira-Hasegawa-like approximate likelihood ratio test have been introduced to speed up t...
متن کاملThe RAxML 7.0.4 Manual
In addition to the sequential version, RAxML offers two ways to exploit parallelism: fine-grained parallelism that can be exploited on shared memory machines or multi-core architectures and coarse-grained parallelism that can be exploited on Linux clusters. The current version of RAxML is a highly optimized program, which handles DNA and AA alignments under various models of substitution and se...
متن کاملStatistical Topology Using the Nonparametric Density Estimation and Bootstrap Algorithm
This paper presents approximate confidence intervals for each function of parameters in a Banach space based on a bootstrap algorithm. We apply kernel density approach to estimate the persistence landscape. In addition, we evaluate the quality distribution function estimator of random variables using integrated mean square error (IMSE). The results of simulation studies show a significant impro...
متن کاملA New Bootstrap Based Algorithm for Hotelling’s T2 Multivariate Control Chart
Normality is a common assumption for many quality control charts. One should expect misleading results once this assumption is violated. In order to avoid this pitfall, we need to evaluate this assumption prior to the use of control charts which require normality assumption. However, in certain cases either this assumption is overlooked or it is hard to check. Robust control charts and bootstra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Systematic biology
دوره 57 5 شماره
صفحات -
تاریخ انتشار 2008