A Machine Learning Approach for MicroRNA Precursor Prediction in Retro-transcribing Virus Genomes

نویسندگان

  • Muserref Duygu Saçar Demirci
  • Mustafa Toprak
  • Jens Allmer
چکیده

Identification of microRNA (miRNA) precursors has seen increased efforts in recent years. The difficulty in experimental detection of pre-miRNAs increased the usage of computational approaches. Most of these approaches rely on machine learning especially classification. In order to achieve successful classification, many parameters need to be considered such as data quality, choice of classifier settings, and feature selection. For the latter one, we developed a distributed genetic algorithm on HTCondor to perform feature selection. Moreover, we employed two widely used classification algorithms libSVM and random forest with different settings to analyze the influence on the overall classification performance. In this study we analyzed 5 human retro virus genomes; Human endogenous retrovirus K113, Hepatitis B virus (strain ayw), Human T lymphotropic virus 1, Human T lymphotropic virus 2, Human immunodeficiency virus 2, and Human immunodeficiency virus 1. We then predicted pre-miRNAs by using the information from known virus and human pre-miRNAs. Our results indicate that these viruses produce novel unknown miRNA precursors which warrant further experimental validation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Combining Multi-Species Genomic Data for MicroRNA Identification Using a Naïve Bayes Classifier Machine Learning for Identification of MicroRNA Genes

Motivation: Numerous computational methodologies utilize techniques based on sequence conservation and/or structural similarity for microRNA gene prediction. In this study we describe a new technique, which is applicable across several species, for predicting microRNA genes. This technique is based on machine learning, using the Naïve Bayes classifier. This computational procedure automatically...

متن کامل

Transparent Machine Learning Algorithm Offers Useful Prediction Method for Natural Gas Density

Machine-learning algorithms aid predictions for complex systems with multiple influencing variables. However, many neural-network related algorithms behave as black boxes in terms of revealing how the prediction of each data record is performed. This drawback limits their ability to provide detailed insights concerning the workings of the underlying system, or to relate predictions to specific ...

متن کامل

Stock Price Prediction using Machine Learning and Swarm Intelligence

Background and Objectives: Stock price prediction has become one of the interesting and also challenging topics for researchers in the past few years. Due to the non-linear nature of the time-series data of the stock prices, mathematical modeling approaches usually fail to yield acceptable results. Therefore, machine learning methods can be a promising solution to this problem. Methods: In this...

متن کامل

PREDICTION OF SLOPE STABILITY STATE FOR CIRCULAR FAILURE: A HYBRID SUPPORT VECTOR MACHINE WITH HARMONY SEARCH ALGORITHM

The slope stability analysis is routinely performed by engineers to estimate the stability of river training works, road embankments, embankment dams, excavations and retaining walls. This paper presents a new approach to build a model for the prediction of slope stability state. The support vector machine (SVM) is a new machine learning method based on statistical learning theory, which can so...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of integrative bioinformatics

دوره 13 5  شماره 

صفحات  -

تاریخ انتشار 2016