Erasing errors due to alignment ambiguity when estimating positive selection.

نویسنده

  • Benjamin Redelings
چکیده

Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Quailty Assurance Algorithm for SeaWinds

The scatterometer wind retrieval process produces several possible wind vector choices or ambiguities at each resolution cell. Ambiguity selection routines are generally ad hoc and often result in ambiguity selection errors. It is important to locate areas of ambiguity selection error to assess the quality of scatterometer wind data. A quality assurance algorithm is presented based on comparing...

متن کامل

An Algorithm to Assess the Accuracy of Nscat Ambiguity Removal

A wind field model can be used to evaluate the accuracy of pointwise ambiguity removal for NASA Scatterometer (NSCAT) data. Errors in pointwise ambiguity removal result in large model-fit errors when the pointwise wind estimates are assimilated into the model. By thresholding the error, regions containing ambiguity removal error can be identified. For these regions, the ambiguity selection can ...

متن کامل

Gyroscope Drift Error Analysis in the Position-Independent Navigation Algorithm of a stable platform Inertial System

This paper deals with analyzing gyroscope drift error in the position-independent navigation algorithm of a stable platform inertial system. Most of the stable platform navigation algorithms proposed in the literature have drawbacks of estimating position rates for alignment commands. Not only the estimating position rates are the basic source of position errors, but they also make the alignmen...

متن کامل

The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection.

The detection of positive Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of positive selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-positive rates for a wide range of selection schemes. ...

متن کامل

An Algorithm to Assess the Accuracy of NASA Scatterometer Data - Geoscience and Remote Sensing Symposium Proceedings, 1998. IGARSS '98. 1998 IEEE International

A simple wind field model can be used to evaluate the accuracy of pointwise ambiguity removal for NASA Scatterometer (NSCAT) data. Errors in pointwise ambiguity removal result in large model-fit errors when the pointwise wind estimates are assimilated into the model. By thresholding the error, regions containing ambiguity removal error can be identified. For many of these regions, the ambiguity...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Molecular biology and evolution

دوره 31 8  شماره 

صفحات  -

تاریخ انتشار 2014