MiRmat: Mature microRNA Sequence Prediction

نویسندگان

  • Chenfeng He
  • Ying-Xin Li
  • Guangxin Zhang
  • Zuguang Gu
  • Rong Yang
  • Jie Li
  • Zhi John Lu
  • Zhi-Hua Zhou
  • Chenyu Zhang
  • Jin Wang
چکیده

BACKGROUND MicroRNAs are known to be generated from primary transcripts mainly through the sequential cleavages by two enzymes, Drosha and Dicer. The sequence of a mature microRNA, especially the 'seeding sequence', largely determines its binding ability and specificity to target mRNAs. Therefore, methods that predict mature microRNA sequences with high accuracy will benefit the identification and characterization of novel microRNAs and their targets, and contribute to inferring the post-transcriptional regulation network at a genome scale. METHODOLOGY/PRINCIPAL FINDINGS We have developed a method, MiRmat, to predict the mature microRNA sequence. MiRmat is essentially composed of two parts: the prediction of Drosha processing site and the identification of Dicer processing site. Based on the analysis of microRNAs from 12 species, we found that the patterns of free energy profiles are conserved among vertebrate microRNA hairpins. Therefore, we introduced in our method the free energy distribution pattern of the downstream part of pri-microRNA secondary structure and Random Forest algorithm to predict the mature microRNA sequence. Based on the evaluation on an independent test dataset from 10 vertebrates, MiRmat was shown to identify 77.8% of the Drosha processing sites and 92.8% of the Dicer sites within a deviation of 2 nt. In a more stringent evaluation by excluding the microRNAs sharing the same family between the training set and test set, MiRmat kept a rather well performance of 71.9% and 87.2% of the identification rate on the Drosha and Dicer site respectively, which represents the ability to deal with the novel microRNA family. MiRmat outperforms other state-of-the-art methods and has a high degree of efficacy for the prediction of mature microRNA sequences of vertebrates. CONCLUSION MiRmat was developed for identifying microRNA mature sequence(s) by introducing the free energy distribution of RNA stem-loop structure and the Random Forest algorithm. We prove that MiRmat has better performance than the existing tools and is applicable among vertebrates. MiRmat is freely available at http://mcube.nju.edu.cn/jwang/lab/soft/MiRmat/.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Genome-wide computational prediction of miRNAs in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) revealed target genes involved in pulmonary vasculature and antiviral innate immunity

The current outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)in China threatened humankind worldwide. The coronaviruses contains the largest RNA genome among all other known RNA viruses, therefore the disease etiology can be understood by analyzing the genome sequence of SARS-CoV-2. In this study, we used an ab-intio based computational tool VMir to scan the complete geno...

متن کامل

A Probabilistic Method for Prediction of microRNA-target Interactions

Elucidation of microRNA activity is a crucial step in understanding gene regulation. One key problem in this effort is how to model the pairwise interaction of microRNAs with their targets. As this interaction is strongly mediated by their sequences, it is desired to set up a probabilistic model to explain the binding between a microRNA sequence and the sequence of a putative target. To this en...

متن کامل

High-throughput amplification of mature microRNAs in uncharacterized animal models using polyadenylated RNA and stem-loop reverse transcription polymerase chain reaction.

This study makes a significant advancement on a microRNA amplification technique previously used for expression analysis and sequencing in animal models without annotated mature microRNA sequences. As research progresses into the post-genomic era of microRNA prediction and analysis, the need for a rapid and cost-effective method for microRNA amplification is critical to facilitate wide-scale an...

متن کامل

Distribution of Mature MicroRNA on Its Precursor: A New Character for MicroRNA Prediction

ground: MicroRNA (miRNA) is a large family of 20~22 nucleotides non-coding RNA, which tes expression of protein-coding genes. Stem-loop structure is an important character of A precursor for computational identification of miRNA genes and has been proved to be close d with miRNA biogenesis. ods: This paper statistically analyzed the hairpin structures of 557 miRNA genes from six yotic organisms...

متن کامل

Identification of microRNA precursors with new sequence-structure features

MicroRNAs are an important subclass of non-coding RNAs (ncRNA), and serve as main players into RNA interference (RNAi). Mature microRNA derived from stem-loop structure called precursor. Identification of precursor microRNA (pre-miRNA) is essential step to target microRNA in whole genome. The present work proposed 25 novel local features for identifying stemloop structure of pre-miRNAs, which c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2012