Improving Protein Interactions Prediction Using Machine Learning and Visual Analytics

نویسندگان

  • MUDITA SINGHAL
  • Mudita Singhal
  • John H. Miller
چکیده

By Mudita Singhal, Ph.D. Washington State University December 2007 Chair: John H. Miller The response of biological systems to external stimuli is ruled by their cellular interaction networks. This makes the problem of inferring cellular interaction networks essential to decipher the basic operational principles of biological systems. Knowing which proteins exist in a certain organism or cell type and how these proteins interact with each other are necessary for the understanding of biological processes at the whole cell level. The determination of the protein-protein interaction (PPI) networks has been the subject of extensive research and it has been shown that domain-domain interactions (DDIs) are good indicators of possible protein interactions, and can more accurately predict protein interactions than comparing full-length protein sequences. Despite the development of reasonably successful methods there is definite scope for improvement. This thesis is aimed at developing machine learning based computational techniques that utilize domain information in the proteins to predict PPI networks. This research aims to make four major contributions to the field of PPIs. The first two are the development of two new PPI prediction algorithms, DomainGA and DomainSVM. DomainGA is a genetic algorithm based multi-parameter optimization method which quantifies DDIs and uses them to predict PPI. The second method, DomainSVM utilizes v the DDI scores obtained from DomainGA in a Support Vector Machine (SVM) based learning system to improve PPIs prediction by overcoming the limitations of DomainGA. These two methods can be used as a two-step filtering process to validate experimentally detected PPI. The third contribution is score assignment to DDIs which is proven to be discriminatory between positive and negative PPI. Finally the fourth contribution is a visual analytic environment called CABIN (Collective Analysis of Biological Interaction Networks) which provides a one-of-its-kind tool to analyze, compare and integrate multiple predicted networks obtained from public data sources and/or inference algorithms such as DomainGA and DomainSVM. The predicted interactions accompanied by a confidence score and an exploratory visualization environment shall help researchers validate experimental observations and/or make an informed decision while generating hypothesis and models for designing new experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

The State-of-the-Art in Predictive Visual Analytics

Predictive analytics embraces an extensive range of techniques including statistical modeling, machine learning, and data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline. Primary uses have been in data cleaning, exploratory an...

متن کامل

Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features

Heart is one of the most important members of the body, and heart disease is the major cause of death in the world and Iran. This is why the early/on time diagnosis is one of the significant basics for preventing and reducing deaths of this disease. So far, many studies have been done on heart disease with the aim of prediction, diagnosis, and treatment. However, most of them have been mostly f...

متن کامل

Predictive Visual Analytics – Approaches for Movie Ratings and Discussion of Open Research Challenges

We present two original approaches for visual-interactive prediction of user movie ratings and box office gross after the opening weekend, as designed and awarded during VAST Challenge 2013. Our approaches are driven by machine learning models and interactive data exploration, respectively. They consider an array of different training data types, including categorical/discrete data, time series...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007