Boosting Alignment Accuracy by Adaptive Local Realignment
نویسندگان
چکیده
While mutation rates can vary markedly over the residues of a protein, multiple sequence alignment tools typically use the same values for their scoring-function parameters across a protein’s entire length. We present a new approach, called adaptive local realignment, that in contrast automatically adapts to the diversity of mutation rates along protein sequences. This builds upon a recent technique known as parameter advising that finds global parameter settings for aligners, to adaptively find local settings. Our approach in essence identifies local regions with low estimated accuracy, constructs a set of candidate realignments using a carefully-chosen collection of parameter settings, and replaces the region if a realignment has higher estimated accuracy. This new method of local parameter advising, when combined with prior methods for global advising, boosts alignment accuracy as much as 26% over the best default setting on hard-to-align protein benchmarks, and by 6.4% over global advising alone. Adaptive local realignment, implemented within the Opal aligner using the Facet accuracy estimator, is available at facet.cs.arizona.edu.
منابع مشابه
Boosting alignment accuracy through adaptive local realignment
Motivation: While mutation rates can vary across the residues of a protein, when computing alignments of protein sequences the same setting of values for substitution score and gap penalty parameters is typically used across their entire length. We provide for the first time a new method called adaptive local realignment that automatically uses diverse parameter settings in different regions of...
متن کاملAdaptive Boosting for Spatial Functions with Unstable Driving Attributes
Combining multiple global models (e.g. back-propagation based neural networks) is an effective technique for improving classification accuracy by reducing a variance through manipulating training data distributions. Standard combining methods do not improve local classifiers (e.g. k-nearest neighbors) due to their low sensitivity to data perturbation. Here, we propose an adaptive attribute boos...
متن کاملAdaptive boosting techniques in heterogeneous and spatial databases
Combining multiple classifiers is an effective technique for improving classification accuracy by reducing the variance through manipulating the training data distributions. In many large-scale data analysis problems involving heterogeneous databases with attribute instability, however, standard boosting methods do not improve local classifiers (e.g. k-nearest neighbors) due to their low sensit...
متن کاملOptimally-Smooth Adaptive Boosting and Application to Agnostic Learning
We describe a new boosting algorithm that is the first such algorithm to be both smooth and adaptive. These two features make possible performance improvements for many learning tasks whose solutions use a boosting technique. The boosting approach was originally suggested for the standard PAC model; we analyze possible applications of boosting in the context of agnostic learning, which is more ...
متن کاملBoosting in the presence of outliers: adaptive classification with non-convex loss functions
This paper examines the role and efficiency of the non-convex loss functions for binary classification problems. In particular, we investigate how to design a simple and effective boosting algorithm that is robust to the outliers in the data. The analysis of the role of a particular non-convex loss for prediction accuracy varies depending on the diminishing tail properties of the gradient of th...
متن کامل