Deep Learning for Drug Target Prediction

نویسندگان

  • Thomas Unterthiner
  • Andreas Mayr
  • Günter Klambauer
  • Marvin Steijaert
  • Jörg K. Wegner
  • Hugo Ceulemans
چکیده

An important computational tool in drug design is target prediction where either for a given chemical structure the interacting biomolecules (e.g. proteins) must be identified. Chemical structures interact with different biomolecules if they have similar 3D structure. Thus, the outputs of the prediction are highly interdependent from each other. Furthermore, we have partially labelled molecules since not all training molecules are measured of being active on each biomolecule. The Merck Kaggle challenge on chemical compound activity was won by Hinton’s group with deep networks. This indicates the high potential of deep learning in drug design and attracted the attention of big pharma. However, the unrealistically small scale of the Kaggle dataset does not allow to assess the value of deep learning in drug target prediction if applied to in-house data of pharmaceutical companies. Even a publicly available drug activity data base like ChEMBL is magnitudes larger than the Kaggle dataset. ChEMBL has 13 M compound descriptors, 1.3 M compounds, and 5 k drug targets, compared to the Kaggle dataset with 11 k descriptors, 164 k compounds, and 15 drug targets. On the ChEMBL database, we compared the performance of deep learning to seven target prediction methods, including two commercial predictors, three predictors deployed by pharma, and machine learning methods that we could scale to this dataset. Deep learning outperformed all other methods with respect to the area under ROC curve and was significantly better than all commercial products. Deep learning surpassed the threshold to make virtual compound screening possible and has the potential to become a standard tool in industrial drug design.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Task Deep Networks for Drug Target Prediction

An important computational tool in drug design is target prediction where either for a given chemical structure the interacting biomolecules (e.g. proteins) must be identified. Chemical structures interact with different biomolecules if they have similar 3D structure. Thus, the outputs of the prediction are highly interdependent from each other. Furthermore, we have partially labelled molecules...

متن کامل

DeepDTA: Deep Drug-Target Binding Affinity Prediction

The identification of novel drug-target (DT) interactions is a substantial part of the drug discovery process. Most of the computational methods that have been proposed to predict DT interactions have focused on binary classification, where the goal is to determine whether a DT pair interacts or not. However, protein-ligand interactions assume a continuum of binding strength values, also called...

متن کامل

Deep Learning as an Opportunity in Virtual Screening

Deep learning excels in vision and speech applications where it pushed the stateof-the-art to a new level. However its impact on other fields remains to be shown. The Merck Kaggle challenge on chemical compound activity was won by Hinton’s group with deep networks. This indicates the high potential of deep learning in drug design and attracted the attention of big pharma. However, the unrealist...

متن کامل

A Template-Based Protein Structure Reconstruction Method Using Deep Autoencoder Learning

Protein structure prediction is an important problem in computational biology, and is widely applied to various biomedical problems such as protein function study, protein design, and drug design. In this work, we developed a novel deep learning approach based on a deeply stacked denoising autoencoder for protein structure reconstruction. We applied our approach to a template-based protein stru...

متن کامل

Deep Unsupervised Domain Adaptation for Image Classification via Low Rank Representation Learning

Domain adaptation is a powerful technique given a wide amount of labeled data from similar attributes in different domains. In real-world applications, there is a huge number of data but almost more of them are unlabeled. It is effective in image classification where it is expensive and time-consuming to obtain adequate label data. We propose a novel method named DALRRL, which consists of deep ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015