The use of the area under the ROC curve in the evaluation of machine learning algorithms
نویسنده
چکیده
In this paper we i n vestigate the use of receiver operating characteristic (ROC) curve f o r the evaluation of machine learning algorithms. In particular, we i n vestigate the use of the area under the ROC curve (A UC) as a measure of classiier performance. The machine learning algorithms used are chosen to be representative of those in common use: two decision trees (C4.5 and Multiscale Classiier)) two n e u r a l n e t works (Perceptron and Multi-layer Perceptron)) and two statistical methods (K-Nearest Neighbours and a Quadratic Discriminant F unction). The evaluation is done using six, \real world," medical diagnostics data sets that contain a varying numbers of inputs and samples, but are primarily continuous input, binary classiication problems. We i d e n tify three forms of bias that can aaect comparisons of this type (estimation, selection, and expert bias) and detail the methods used to avoid them. We compare and discuss the use of AUC with the conventional measure of classiier performance, overall accuracy (the probability of a correct response). It is found that AUC exhibits a number of desirable properties when compared to overall accuracy: increased sensitivity in Analysis of Variance (ANOVA) testss a standard error that decreased as both AUC and the number of test samples increasedd decision threshold independentt invariant t o a priori class probabilitiess and it gives an indication of the amount o f \ w ork done" by a classiication scheme, giving low scores to both random and \one class only" classiiers. It has been known for some time that AUC actually represents the probability that a randomly chosen positive example is correctly rated (ranked) with greater suspicion than a randomly chosen negative example. Moreover, this probability of correct ranking is the same quantity estimated by the non-parametric Wilcoxon statistic. We use this equivalence to show that the standard deviation of AUC, estimated using 10 fold cross validation, is a reliable estimator of the standard error estimated using the Wilcoxon test. The paper concludes with the recommendation that AUC be used in preference to overall accuracy when \single number" evaluation of machine learning algorithms is required. Abstract In this paper we i n vestigate the use of the area under the receiver operating characteristic (ROC) curve (A UC) as a performance measure for machine learning algorithms. As a case study we e v aluate six machine …
منابع مشابه
zoning of flood hazard in Nowshahr city using machine learning models
The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...
متن کاملPrediction of Sepsis Due to Acinetobacter Infection in Neonates Admitted to NICU
Background and Aim: Sepsis is the most important disease in the first 28 days of life and one of the main causes of infant mortality in the intensive care unit. Its definitive diagnosis is possible by performing blood culture. Neonatal sepsis can be a clinical sign of nosocomial infections that are often resistant to antibiotics. Therefore, the purpose of this study was to create and evaluate a...
متن کاملPii: S0031-3203(96)00142-2
-In this paper we investigate the use of the area under the receiver operating characteristic (ROC) curve (AUC) as a performance measure for machine learning algorithms. As a case study we evaluate six machine learning algorithms (C4.5, Multiscale Classifier, Perceptron, Multi-layer Perceptron, k-Nearest Neighbours, and a Quadratic Discriminant Function) on six "real world" medical diagnostics ...
متن کاملApplication of Support Vector Machine for Detection of Functional Limitations in the Diabetic Patients of the Northwest of IRAN in 2017: A Descriptive Study
Background and Objectives: Support vector machine (SVM) is a robust and effective statistical method for the diagnosis and prediction of clinical outcomes based on combinations of predictor variables. The aim of this study was to use SVM to detect the functional limitations in the diabetic patients and evaluate the accuracy of this diagnosis. Materials and Methods: This descriptive study was c...
متن کاملEarly Prediction of Gestational Diabetes Using Decision Tree and Artificial Neural Network Algorithms
Introduction: Gestational diabetes is associated with many short-term and long-term complications in mothers and newborns; hence, the detection of its risk factors can contribute to the timely diagnosis and prevention of relevant complications. The present study aimed to design and compare Gestational diabetes mellitus (GDM) prediction models using artificial intelligence algorithms. Materials ...
متن کاملReceiver Operating Characteristic (ROC) Curve Analysis for Medical Diagnostic Test Evaluation
This review provides the basic principle and rational for ROC analysis of rating and continuous diagnostic test results versus a gold standard. Derived indexes of accuracy, in particular area under the curve (AUC) has a meaningful interpretation for disease classification from healthy subjects. The methods of estimate of AUC and its testing in single diagnostic test and also comparative studies...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 30 شماره
صفحات -
تاریخ انتشار 1997