Lightweight comparison of RNAs based on exact sequence–structure matches
نویسندگان
چکیده
MOTIVATION Specific functions of ribonucleic acid (RNA) molecules are often associated with different motifs in the RNA structure. The key feature that forms such an RNA motif is the combination of sequence and structure properties. In this article, we introduce a new RNA sequence-structure comparison method which maintains exact matching substructures. Existing common substructures are treated as whole unit while variability is allowed between such structural motifs. Based on a fast detectable set of overlapping and crossing substructure matches for two nested RNA secondary structures, our method ExpaRNA (exact pattern of alignment of RNA) computes the longest collinear sequence of substructures common to two RNAs in O(H.nm) time and O(nm) space, where H << n.m for real RNA structures. Applied to different RNAs, our method correctly identifies sequence-structure similarities between two RNAs. RESULTS We have compared ExpaRNA with two other alignment methods that work with given RNA structures, namely RNAforester and RNA_align. The results are in good agreement, but can be obtained in a fraction of running time, in particular for larger RNAs. We have also used ExpaRNA to speed up state-of-the-art Sankoff-style alignment tools like LocARNA, and observe a tradeoff between quality and speed. However, we get a speedup of 4.25 even in the highest quality setting, where the quality of the produced alignment is comparable to that of LocARNA alone. AVAILABILITY The presented algorithm is implemented in the program ExpaRNA, which is available from our website (http://www.bioinf.uni-freiburg.de/Software).
منابع مشابه
Introducing a Lightweight Structural Model via Simulation of Vernacular “Pa Tu Pa” Arch
The knowledge of Iranian vernacular structures is based on geometry, and there is a possibility of recreating such structural patterns aimed at producing movable structures. The purpose of this research was to utilize the patterns of vernacular structures to provide a lightweight structural model. The questions raised included how to create various forms based on the structural history of any r...
متن کاملSPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics
MOTIVATION RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search ...
متن کاملComputational Identification of Micro RNAs and Their Transcript Target(s) in Field Mustard (Brassica rapa L.)
Background: Micro RNAs (miRNAs) are a pivotal part of non-protein-coding endogenous small RNA molecules that regulate the genes involved in plant growth and development, and respond to biotic and abiotic environmental stresses posttranscriptionally.Objective: In the present study, we report the results of a systemic search for identifi cation of new miRNAs in B. rapa using homology-based ...
متن کاملComment on “Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs”
In their article “Systematic identification and evolutionary features of rhesus monkey small nucleolar RNAs”, Zhang et al. [BMC Genomics, 11: 61 (2010)] report on the discovery of 117 rhesus monkey non-coding RNAs, of which eight remained unannotated. In this commentary, these sequences are revisited and annotations are derived for seven of the eight “unclassified ncRNA candidates”. Zhang et al...
متن کاملPhylogenetic Analysis of Beta-Glucanase Producing Actinomycetes Strain TBG-CH22 - A Comparison of Conventional and Molecular Morphometric Approach
Actinomycetes are inexhaustible producers of commercially valuable metabolites, are continually screened for beneficial compounds. The taxonomic and phylogenetic study of novel actinomycetes strains are mostly based on conventional methods and primary DNA structure of 16s rRNA. Although 16s rRNA sequence is well accepted in phylogeny studies, its secondary structures have not been widely used. ...
متن کامل