Generalized affine gap costs for protein sequence alignment.
نویسنده
چکیده
Based on the observation that a single mutational event can delete or insert multiple residues, affine gap costs for sequence alignment charge a penalty for the existence of a gap, and a further length-dependent penalty. From structural or multiple alignments of distantly related proteins, it has been observed that conserved residues frequently fall into ungapped blocks separated by relatively nonconserved regions. To take advantage of this structure, a simple generalization of affine gap costs is proposed that allows nonconserved regions to be effectively ignored. The distribution of scores from local alignments using these generalized gap costs is shown empirically to follow an extreme value distribution. Examples are presented for which generalized affine gap costs yield superior alignments from the standpoints both of statistical significance and of alignment accuracy. Guidelines for selecting generalized affine gap costs are discussed, as is their possible application to multiple alignment.
منابع مشابه
A generalized affine gap model significantly improves protein sequence alignment accuracy.
Sequence alignment underpins common tasks in molecular biology, including genome annotation, molecular phylogenetics, and homology modeling. Fundamental to sequence alignment is the placement of gaps, which represent character insertions or deletions. We assessed the ability of a generalized affine gap cost model to reliably detect remote protein homology and to produce high-quality alignments....
متن کاملDynamic Gap Selector: A Smith Waterman Sequence Alignment Algorithm with Affine Gap Model Optimization
SmithWaterman algorithm (S-W) is a widespread method to perform local alignments of biological sequences of proteins, DNA and RNA molecules. Indeed, S-W is able to ensure better accuracy levels with respect to the heuristic alignment algorithms by extensively exploring all the possible alignment configurations between the sequences under examination. It has been proven that the first amino acid...
متن کاملMultiple Structural Rna Alignment with Affine Gap Costs Based on Lagrangian Relaxation
In this thesis the structural alignment of RNA sequences is addressed, a topic of crucial significance in the field of computational biology. Contrary to alignments of DNA, alignments of RNA are not only aligned based on sequence information, but largely depend on the correct structural alignment. Since the functions of RNA depend mostly on its secondary structure and this is highly conserved t...
متن کاملOptimal sequence alignment using affine gap costs.
When comparing two biological sequences, it is often desirable for a gap to be assigned a cost not directly proportional to its length. If affine gap costs are employed, in other words if opening a gap costs v and each null in the gap costs u, the algorithm of Gotoh (1982, J. molec. Biol. 162,705) finds the minimum cost of aligning two sequences in order MN steps. Gotoh’s algorithm attempts to ...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proteins
دوره 32 1 شماره
صفحات -
تاریخ انتشار 1998