Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
نویسندگان
چکیده
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
منابع مشابه
Improvement of clustal-derived sequence alignments with evolutionary algorithms
Multiple sequence alignment (MSA) is a central problem in bioinformatics. In this study, we extended previous efforts using evolutionary algorithms (EAs) for MSA. Candidate solutions in the initial population were derived from the well-known alignment program Clustal X. Evolutionary computation was then used to evolve increasingly appropriate solutions. Three new alignment operators were introd...
متن کاملThe CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.
CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight ...
متن کاملPerformance Optimization of Clustal W: Parallel Clustal W, HT Clustal, and MULTICLUSTAL
Multiple sequence alignments represent a class of powerful bioinformatics tools with many uses in computational biology. Knowledge of multiple alignments (MA) helps to predict secondary and tertiary structures and to detect homologies between newly sequenced genes and existing gene (protein) families. With the adoption of high-throughput (HT) automation, offering scientists significantly more d...
متن کاملMaking automated multiple alignments of very large numbers of protein sequences
MOTIVATION Recent developments in sequence alignment software have made possible multiple sequence alignments (MSAs) of >100 000 sequences in reasonable times. At present, there are no systematic analyses concerning the scalability of the alignment quality as the number of aligned sequences is increased. RESULTS We benchmarked a wide range of widely used MSA packages using a selection of prot...
متن کاملImprovement of Structure Conservation Index with Centroid Estimators
RNAz, a support vector machine (SVM) approach for identifying functional non-coding RNAs (ncRNAs), has been proven to be one of the most accurate tools for this goal. Among the measurements used in RNAz, the Structure Conservation Index (SCI) which evaluates the evolutionary conservation of RNA secondary structures in terms of folding energies, has been reported to have an extremely high discri...
متن کامل