Multiple alignment and multiple sequence based searches

نویسنده

  • Sean R. Eddy
چکیده

Multiple sequence alignments reveal patterns of conservation that can be exploited in database searches using “profile” methods. Starting with a single nematode sequence that has no informative BLAST hits, I give a real example of the use of multiple alignment and profile search software to detect informative remote homologies. It used to be that most new sequences were novel, with no informative similarity to anything in the sequence database. Thanks to genome sequencing projects, things are slightly better now. New sequences are often similar to several uncharacterized sequences, defining whole families of novel genes with no informative BLAST or FASTA similarities. Given a sequence family, though, powerful alternative similarity search methods can be applied. Software packages are available that can take a multiple sequence alignment and build a “profile” of it. Profiles incorporate position-specific scoring information derived from the frequency that a given residue is seen in an aligned column. Because sequence families preferentially conserve certain critical residues and motifs, this information can sometimes allow more sensitive database searches to be done. Most new profile software is based on statistical models called “hidden Markov models” (HMMs). Here, I show a practical demonstration of a multiple alignment based similarity search. Much more comprehensive reviews of the literature on profile hidden Markov model methods are available elsewhere [1, 2, 3], including two recent books [4, 5]. An example sequence In the C. elegans genome, several large paralogous gene families that were first thought to be nematode specific have since been classified as putative G-protein coupled receptors (GPCRs) [6, 7]. Detecting similarity between these nematode sequences and known GPCRs in other organisms is a nontrivial sequence analysis task. I arbitrarily chose the putative GPCR gene sra-4 (Wormpep AH6.8; SWISS-PROT SRA4 CAEEL; 329 aa) as an example. The task is to find a significant similarity between AH6.8 and a protein of known function in another organism. A WWW BLAST search at NCBI [8] using AH6.8 as a query (BLASTP 2.0.4, default options, vs. 319,187 sequences in the NR database on 7/30/98) shows 46 hits with E-values less than , but all but one of them are to uncharacterized C. elegans sequences. The top scoring non-worm hits are a mitochondrial

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Application of the ABS LX Algorithm to Multiple Sequence Alignment

We present an application of ABS algorithms for multiple sequence alignment (MSA). The Markov decision process (MDP) based model leads to a linear programming problem (LPP), whose solution is linked to a suggested alignment. The important features of our work include the facility of alignment of multiple sequences simultaneously and no limit for the length of the sequences. Our goal here is to ...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

PROMALS: towards accurate multiple sequence alignments of distantly related proteins

MOTIVATION Accurate multiple sequence alignments are essential in protein structure modeling, functional prediction and efficient planning of experiments. Although the alignment problem has attracted considerable attention, preparation of high-quality alignments for distantly related sequences remains a difficult task. RESULTS We developed PROMALS, a multiple alignment method that shows promi...

متن کامل

Use of multiple profiles corresponding to a sequence alignment enables effective detection of remote homologues

MOTIVATION Position specific scoring matrices (PSSMs) corresponding to aligned sequences of homologous proteins are commonly used in homology detection. A PSSM is generated on the basis of one of the homologues as a reference sequence, which is the query in the case of PSI-BLAST searches. The reference sequence is chosen arbitrarily while generating PSSMs for reverse BLAST searches. In this wor...

متن کامل

Multiple molecular sequence alignment by island parallel genetic algorithm

This paper presents an evolution-based approach for solving multiple molecular sequence alignment. The approach is based on the island parallel genetic algorithm that relies on the fitness distribution over the population of alignments. The algorithm searches for an alignment among the independent isolated evolving populations by optimizing weighted sum of pairs objective function which measure...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998