Technical Reports inTaxonomy 00-01 On The Dangers Of Aligning RNA Sequences Using “Conserved” Motifs
نویسنده
چکیده
Aligning RNA sequences can be a challenging task. Automatic sequence alignment programs typically align sequences only with respect to primary sequence, and as a result may yield spurious alignments. Incorporating information on RNA secondary structure can improve the alignment (Kjer, 1995; Titus and Frost, 1996), but this must usually be done by hand. Various algorithms and programs exist that incorporate RNA secondary structure, but these are either limited to pairwise alignment of one sequence with respect to a reference sequence and structure (Bafna et al. or are too computationally intensive to be applicable to sequences longer than about 150 nucleotides (Eddy and Durbin, 1994). Given the current lack of automatic methods for aligning RNA sequences, we could ask how well standard alignment programs perform. Hickson et al. (2000) addressed this question using a suite of ten conserved motifs to score the alignments produced by five different programs. They employed a reference alignment for 10 mitochondrial 12S rRNA sequences constructed manually using conserved motifs (Hickson et al., 1996). In their discussion the authors noted (p. 535) that the honeybee caused the five alignment programs the most difficulties. By comparing their reference alignment (their fig. 1) to alignments in the small subunit RNA database (van de Peer et al., 2000), and an alignment of insect 12S rRNA secondary structure (unpublished data), it is clear that Hickson et al. have incorrectly aligned the honeybee sequence between motifs 7 and 10. They identify the five bases UGAAA at position 14866-14870 in the honeybee mitochondrial genome (Crozier and Crozier, 1993) as motif 8. Doing this results in a lengthy insertion in the honeybee sequence upstream from motif 8, and a corresponding deletion upstream of motif 10 (Figure 1). Their alignment also shows a single gap in motif 9, which includes helix 33' in Hickson et al.'s (1996) secondary structure model. No other sequence in their alignment (or the larger one they published in 1996) has a gap in this highly conserved helix. They also postulate a large deletion in helix 48, which removes the loop and part of the 3' stem from this helix (Figure 1). These violations of conserved structures in 12S rRNA casts serious doubt on the alignment. I suggest that the authors have placed too much reliance on motif 8 (" yrgrr ") being conserved across all taxa. To investigate the apparent misalignment further, I used the program RAGA (Notredame et al., …
منابع مشابه
In silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties
Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...
متن کاملA graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences
MOTIVATION RNA structure motifs contained in mRNAs have been found to play important roles in regulating gene expression. However, identification of novel RNA regulatory motifs using computational methods has not been widely explored. Effective tools for predicting novel RNA regulatory motifs based on genomic sequences are needed. RESULTS We present a new method for predicting common RNA seco...
متن کاملFunctional motifs in Escherichia coli NC101
Escherichia coli (E. coli) bacteria can damage DNA of the gut lining cells and may encourage the development of colon cancer according to recent reports. Genetic switches are specific sequence motifs and many of them are drug targets. It is interesting to know motifs and their location in sequences. At the present study, Gibbs sampler algorithm was used in order to predict and find functional m...
متن کاملTree Gibbs Sampler: identifying conserved motifs without aligning orthologous sequences
SUMMARY Tree Gibbs Sampler is a software for identifying motifs by simultaneously using the motif overrepresentation property and the motif evolutionary conservation property. It identifies motifs without depending on pre-aligned orthologous sequences, which makes it useful for the extraction of regulatory elements in multiple genomes of both closely related and distant species. AVAILABILITY ...
متن کاملDesigning Of Degenerate Primers-Based Polymerase Chain Reaction (PCR) For Amplification Of WD40 Repeat-Containing Proteins Using Local Allignment Search Method
Degenerate primers-based polymerase chain reaction (PCR) are commonly used for isolation of unidentified gene sequences in related organisms. For designing the degenerate primers, we propose the use of local alignment search method for searching the conserved regions long enough to design an acceptable primer pair. To test this method, a WD40 repeat-containing domain protein from Beauveria bass...
متن کامل