Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening

نویسندگان

چکیده

Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries molecules to extract drugs. Ensemble learning a recent paradigm launched improve machine results in terms predictive performance robustness. It has been successfully applied ligand-based virtual (LBVS) approaches. Applying ensemble huge molecular computationally expensive. Hence, distribution parallelisation task have become significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose new approach HEnsL_DLBVS, for heterogeneous learning, distributed Spark large-scale LBVS results. To handle problem imbalanced big training datasets, novel hybrid technique. We generate datasets evaluate approach. Experimental confirm effectiveness our with satisfactory accuracy its superiority over homogeneous models.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient iterative virtual screening with Apache Spark and conformal prediction

BACKGROUND Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION In this study we propose a strategy that is b...

متن کامل

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While S...

متن کامل

Large-scale virtual screening on public cloud resources with Apache Spark

BACKGROUND Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure ra...

متن کامل

MLlib: Machine Learning in Apache Spark

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...

متن کامل

Evaluation of machine-learning methods for ligand-based virtual screening

Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Data Mining, Modelling and Management

سال: 2021

ISSN: ['1759-1171', '1759-1163']

DOI: https://doi.org/10.1504/ijdmmm.2021.10035119