An ensemble model of QSAR tools for regulatory risk assessment
نویسندگان
چکیده
Quantitative structure activity relationships (QSARs) are theoretical models that relate a quantitative measure of chemical structure to a physical property or a biological effect. QSAR predictions can be used for chemical risk assessment for protection of human and environmental health, which makes them interesting to regulators, especially in the absence of experimental data. For compatibility with regulatory use, QSAR models should be transparent, reproducible and optimized to minimize the number of false negatives. In silico QSAR tools are gaining wide acceptance as a faster alternative to otherwise time-consuming clinical and animal testing methods. However, different QSAR tools often make conflicting predictions for a given chemical and may also vary in their predictive performance across different chemical datasets. In a regulatory context, conflicting predictions raise interpretation, validation and adequacy concerns. To address these concerns, ensemble learning techniques in the machine learning paradigm can be used to integrate predictions from multiple tools. By leveraging various underlying QSAR algorithms and training datasets, the resulting consensus prediction should yield better overall predictive ability. We present a novel ensemble QSAR model using Bayesian classification. The model allows for varying a cut-off parameter that allows for a selection in the desirable trade-off between model sensitivity and specificity. The predictive performance of the ensemble model is compared with four in silico tools (Toxtree, Lazar, OECD Toolbox, and Danish QSAR) to predict carcinogenicity for a dataset of air toxins (332 chemicals) and a subset of the gold carcinogenic potency database (480 chemicals). Leave-one-out cross validation results show that the ensemble model achieves the best trade-off between sensitivity and specificity (accuracy: 83.8 % and 80.4 %, and balanced accuracy: 80.6 % and 80.8 %) and highest inter-rater agreement [kappa (κ): 0.63 and 0.62] for both the datasets. The ROC curves demonstrate the utility of the cut-off feature in the predictive ability of the ensemble model. This feature provides an additional control to the regulators in grading a chemical based on the severity of the toxic endpoint under study.
منابع مشابه
Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs.
This article provides an overview of methods for reliability assessment of quantitative structure-activity relationship (QSAR) models in the context of regulatory acceptance of human health and environmental QSARs. Useful diagnostic tools and data analytical approaches are highlighted and exemplified. Particular emphasis is given to the question of how to define the applicability borders of a Q...
متن کاملRegulatory issues in management of chemicals in OECD member countries.
The chemical risk assessment is determinant for the approval of any kind of chemical. Each aspect of chemical is taken into consideration for the new chemical legislation registration, evaluation, and authorization of chemicals (REACH). However, some improvements can be made in order to select and authorize a chemical. QSAR techniques have been used for the study of several kind of toxicologica...
متن کاملReview of QSAR Models for Ready Biodegradation
Many regulatory laws resulting from the enactment of the United Nations Stockholm Convention in May 2004, together with the new REACH legislation, have promoted significant new activity in the assessment of Persistent, Bioaccumulative and Toxic (PBT) substances. These are chemicals that have the potential to persist in the environment, accumulate within the tissues of living organisms and, in t...
متن کاملClassification of Customer’s Credit Risk Using Ensemble learning (Case study: Sepah Bank)
Banks activities are associated with different kinds of risk such as cresit risk. Considering the limited financial resources of banks to provide facilities, assessment of the ability of repayment of bank customers before granting facilities is one of the most important challenges facing the banking system of the country. Accordingly, in this research, we tried to provide a model for determinin...
متن کاملارائه مدل ترکیبی شبکه های عصبی با بهره گیری از یادگیری جمعی به منظور ارزیابی ریسک اعتباری
Banking is a specific industry that deals with capital and risk for making profit. Credit risk as the most important risk, is an active research domain in financial risk management studies. In this paper a hybrid model for credit risk assessment which applies ensemble learning for credit granting decisions is designed. Combining clustering and classification techniques resulted in system improv...
متن کامل