The Impact of the Rate Prior on Bayesian Estimation of Divergence Times with Multiple Loci
نویسندگان
چکیده
Bayesian methods provide a powerful way to estimate species divergence times by combining information from molecular sequences with information from the fossil record. With the explosive increase of genomic data, divergence time estimation increasingly uses data of multiple loci (genes or site partitions). Widely used computer programs to estimate divergence times use independent and identically distributed (i.i.d.) priors on the substitution rates for different loci. The i.i.d. prior is problematic. As the number of loci (L) increases, the prior variance of the average rate across all loci goes to zero at the rate 1/L. As a consequence, the rate prior dominates posterior time estimates when many loci are analyzed, and if the rate prior is misspecified, the estimated divergence times will converge to wrong values with very narrow credibility intervals. Here we develop a new prior on the locus rates based on the Dirichlet distribution that corrects the problematic behavior of the i.i.d. prior. We use computer simulation and real data analysis to highlight the differences between the old and new priors. For a dataset for six primate species, we show that with the old i.i.d. prior, if the prior rate is too high (or too low), the estimated divergence times are too young (or too old), outside the bounds imposed by the fossil calibrations. In contrast, with the new Dirichlet prior, posterior time estimates are insensitive to the rate prior and are compatible with the fossil calibrations. We re-analyzed a phylogenomic data set of 36 mammal species and show that using many fossil calibrations can alleviate the adverse impact of a misspecified rate prior to some extent. We recommend the use of the new Dirichlet prior in Bayesian divergence time estimation. [Bayesian inference, divergence time, relaxed clock, rate prior, partition analysis.].
منابع مشابه
A Note on Evolutionary Rate Estimation in Bayesian Evolutionary Analysis: Focus on Pathogens
Bayesian evolutionary analysis provide a statistically sound and flexible framework for estimation of evolutionary parameters. In this method, posterior estimates of evolutionary rate (μ) are derived by combining evolutionary information in the data with researcher’s prior knowledge about the true value of μ. Nucleotide sequence samples of fast evolving pathogens that are taken at d...
متن کاملEstimation of parameter of proportion in Binomial Distribution Using Adjusted Prior Distribution
Historically, various methods were suggested for the estimation of Bernoulli and Binomial distributions parameter. One of the suggested methods is the Bayesian method, which is based on employing prior distribution. Their sound selection on parameter space play a crucial role in reducing posterior Bayesian estimator error. At times, large scale of the parametric changes on parameter space bring...
متن کاملDivergence times and morphological evolution of the subtribe Eritrichiinae (Boraginaceae-Rochelieae) with special reference to Lappula
The subtribe Eritrichiinae belongs to tribe Rochelieae (Borginaceae; Cynoglossoideae) which is composed of about 200 species in five genera including Eritrichium, Lappula, Hackelia, Lepechiniella, and Rochelia. The majority of the species are annual and grow in xeric habitats. The genus Lappula as an arid adapted and the second biggest genus...
متن کاملImproving the Performance of Bayesian Estimation Methods in Estimations of Shift Point and Comparison with MLE Approach
A Bayesian analysis is used to detect a change-point in a sequence of independent random variables from exponential distributions. In This paper, we try to estimate change point which occurs in any sequence of independent exponential observations. The Bayes estimators are derived for change point, the rate of exponential distribution before shift and the rate of exponential distribution after s...
متن کاملInferring speciation times under an episodic molecular clock.
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergenc...
متن کامل