MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features
نویسندگان
چکیده
To distinguish the real pre-miRNAs from other hairpin sequences with similar stem-loops (pseudo pre-miRNAs), a hybrid feature which consists of local contiguous structure-sequence composition, minimum of free energy (MFE) of the secondary structure and P-value of randomization test is used. Besides, a novel machine-learning algorithm, random forest (RF), is introduced. The results suggest that our method predicts at 98.21% specificity and 95.09% sensitivity. When compared with the previous study, Triplet-SVM-classifier, our RF method was nearly 10% greater in total accuracy. Further analysis indicated that the improvement was due to both the combined features and the RF algorithm. The MiPred web server is available at http://www.bioinf.seu.edu.cn/miRNA/. Given a sequence, MiPred decides whether it is a pre-miRNA-like hairpin sequence or not. If the sequence is a pre-miRNA-like hairpin, the RF classifier will predict whether it is a real pre-miRNA or a pseudo one.
منابع مشابه
Benchmark comparison of ab initio microRNA identification methods and software.
MicroRNAs (miRNAs) are short, non-coding RNA molecules that play an important role in the world of genes, especially in regulating the gene expression of target messenger RNAs through cleavage or translational repression of messenger RNA. Ab initio methods have become popular in computational miRNA detection. Most software tools are designed to distinguish miRNA precursors from pseudo-hai...
متن کاملThe prediction of Persian Squirrel Distribution Using a Combined Modeling Approach in the Forest Landscapes of Luristan Province
Habitat destruction is the most important factor determining species extinction; hence, the management of wildlife populations necessitates the management of habitats. Habitat suitability modeling is one of the best tools used for habitat management. There are several methods for habitat suitability modeling, with each of having some different advantages and disadvantages. In this study, we us...
متن کاملAccuracy Improvement of Mood Disorders Prediction using a Combination of Data Mining and Meta-Heuristic Algorithms
Introduction: Since the delay or mistake in the diagnosis of mood disorders due to the similarity of their symptoms hinders effective treatment, this study aimed to accurately diagnose mood disorders including psychosis, autism, personality disorder, bipolar, depression, and schizophrenia, through modeling and analyzing patients' data. Method: Data collected in this applied developmental resear...
متن کاملAccuracy Improvement of Mood Disorders Prediction using a Combination of Data Mining and Meta-Heuristic Algorithms
Introduction: Since the delay or mistake in the diagnosis of mood disorders due to the similarity of their symptoms hinders effective treatment, this study aimed to accurately diagnose mood disorders including psychosis, autism, personality disorder, bipolar, depression, and schizophrenia, through modeling and analyzing patients' data. Method: Data collected in this applied developmental resear...
متن کاملSemi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کامل