Statistical Tests for Comparing Supervised Classi cation Learning Algorithms

نویسنده

  • Thomas G. Dietterich
چکیده

This paper reviews ve statistical tests for determining whether one learning algorithm out-performs another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a diierence when no diierence exists (type 1 error). Two widely-used statistical tests are shown to have high probability of Type I error in certain situations and should never be used. These tests are (a) a test for the diierence of two proportions and (b) a paired-diierences t test based on taking several random train/test splits. A third test, a paired-diierences t test based on 10-fold cross-validation, exhibits somewhat elevated probability of Type I error. A fourth test, McNemar's test, is shown to have low Type I error. The fth test is a new test, 5x2cv, based on 5 iterations of 2-fold cross-validation. Experiments show that this test also has good Type I error. The paper also measures the power (ability to detect algorithm diierences when they do exist) of these tests. The 5x2cv test is shown to be slightly more powerful than McNemar's test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar's test is the only test with acceptable Type I error. For algorithms that can be executed ten times, the 5x2cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combination of neural and statistical algorithms for supervised classification of remote-sensing image

Various experimental comparisons of algorithms for supervised classi®cation of remote-sensing images have been reported in the literature. Among others, a comparison of neural and statistical classi®ers has previously been made by the authors in (Serpico, S.B., Bruzzone, L., Roli, F., 1996. Pattern Recognition Letters 17, 1331±1341). Results of reported experiments have clearly shown that the s...

متن کامل

Approximate Statistical Tests for Comparing Supervised Classi cation Learning Algorithms

This paper reviews ve approximate statistical tests for determining whether one learning algorithm out-performs another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a diierence when no diierence exists (type I error). Two widely-used statistical tests are shown to have high probability of Type I error in certain s...

متن کامل

Combining Labeled and Unlabeled Data for MultiClass Text Categorization

Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such ...

متن کامل

Machine Learning Research: Four Current Directions

Machine Learning research has been making great progress in many directions This article summarizes four of these directions and discusses some current open problems The four directions are a improving classi cation accuracy by learning ensembles of classi ers b methods for scaling up supervised learning algorithms c reinforcement learning and d learning complex stochastic models

متن کامل

Comparison and Combination of Statistical and Neural Network Algorithms for Remote-sensing Image Classification

In recent years, the remote-sensing community has became very interested in applying neural networks to image classi cation and in comparing neural networks performances with the ones of classical statistical methods. These experimental comparisons pointed out that no single classi cation algorithm can be regarded as a \panacea". The superiority of one algorithm over the other strongly depends ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996