Distributed heterogeneous ensemble learning on Apache Spark for ligand-based virtual screening
نویسندگان
چکیده
Virtual screening is one of the most common computer-aided drug design techniques that apply computational tools and methods on large libraries molecules to extract drugs. Ensemble learning a recent paradigm launched improve machine results in terms predictive performance robustness. It has been successfully applied ligand-based virtual (LBVS) approaches. Applying ensemble huge molecular computationally expensive. Hence, distribution parallelisation task have become significant step by using sophisticated frameworks such as Apache Spark. In this paper, we propose new approach HEnsL_DLBVS, for heterogeneous learning, distributed Spark large-scale LBVS results. To handle problem imbalanced big training datasets, novel hybrid technique. We generate datasets evaluate approach. Experimental confirm effectiveness our with satisfactory accuracy its superiority over homogeneous models.
منابع مشابه
Efficient iterative virtual screening with Apache Spark and conformal prediction
BACKGROUND Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION In this study we propose a strategy that is b...
متن کاملFlare: Native Compilation for Heterogeneous Workloads in Apache Spark
The need for modern data analytics to combine relational, procedural, and map-reduce-style functional processing is widely recognized. State-of-the-art systems like Spark have added SQL front-ends and relational query optimization, which promise an increase in expressiveness and performance. But how good are these extensions at extracting high performance from modern hardware platforms? While S...
متن کاملLarge-scale virtual screening on public cloud resources with Apache Spark
BACKGROUND Structure-based virtual screening is an in-silico method to screen a target receptor against a virtual molecular library. Applying docking-based screening to large molecular libraries can be computationally expensive, however it constitutes a trivially parallelizable task. Most of the available parallel implementations are based on message passing interface, relying on low failure ra...
متن کاملMLlib: Machine Learning in Apache Spark
Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shi...
متن کاملEvaluation of machine-learning methods for ligand-based virtual screening
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Mining, Modelling and Management
سال: 2021
ISSN: ['1759-1171', '1759-1163']
DOI: https://doi.org/10.1504/ijdmmm.2021.10035119