Human Protein Function Prediction using Decision Tree Induction

نویسندگان

  • Manpreet Singh
  • Parminder Kaur Wadhwa
  • Parvinder Singh Sandhu
چکیده

To overcome the problem of exponentially increasing protein data, drug discoverers need efficient machine learning techniques to predict the functions of proteins which are responsible for various diseases in human body. The existing decision tree induction methodology C4.5 uses the entropy calculation for best attribute selection. The proposed method develops a new decision tree induction technique in which uncertainty measure is used for best attribute selection. This is based on the study of priority based packages of SDFs (Sequence Derived Features). The present research work results the creation of better decision tree in terms of depth than the existing C4.5 technique. The tree with greater depth ensures more number of tests before functional class assignment and thus results in more accurate predictions than the existing prediction technique. For the same test data, the percentage accuracy of the new HPF (Human Protein Function) predictor is 72% and that of the existing prediction technique is 44%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DIAGNOSIS OF BREAST LESIONS USING THE LOCAL CHAN-VESE MODEL, HIERARCHICAL FUZZY PARTITIONING AND FUZZY DECISION TREE INDUCTION

Breast cancer is one of the leading causes of death among women. Mammography remains today the best technology to detect breast cancer, early and efficiently, to distinguish between benign and malignant diseases. Several techniques in image processing and analysis have been developed to address this problem. In this paper, we propose a new solution to the problem of computer aided detection and...

متن کامل

A New Acceptance Sampling Design Using Bayesian Modeling and Backwards Induction

In acceptance sampling plans, the decisions on either accepting or rejecting a specific batch is still a challenging problem. In order to provide a desired level of protection for customers as well as manufacturers, in this paper, a new acceptance sampling design is proposed to accept or reject a batch based on Bayesian modeling to update the distribution function of the percentage of nonconfor...

متن کامل

مطالعات درخت تصمیم در برآورد ریسک ابتلا به سرطان سینه با استفاده از چند شکلی‌های تک نوکلوئیدی

Abstract Introduction:   Decision tree is the data mining tools to collect, accurate prediction and sift information from massive amounts of data that are used widely in the field of computational biology and bioinformatics. In bioinformatics can be predict on diseases, including breast cancer. The use of genomic data including single nucleotide polymorphisms is a very important ...

متن کامل

Human Protein Function Prediction from Sequence Derived Features using See5

Abstract— Drug Discovery is a tedious process and involves lot of iterations and different processes for the final approval. The pres ent work focus on prediction of molecular class of an unknown protein. The sequence data is taken from HPRD (Human Protein Reference Database) and then the different features are explored for each molecular sequence using various online tools. The decision tree w...

متن کامل

Protein Structure Prediction and Interpretation with Support Vector Machines and Decision Trees

Prediction of protein structures from protein sequences using computers is an important step to discover proteins' 3D conformation structures and their functions and hence has profound theoretical and practical significance in areas such as protein engineering and drug design. In this talk, we will discuss our new results in protein secondary structure and Transmembrane protein prediction using...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007