Should We Really Use Post-Hoc Tests Based on Mean-Ranks?

نویسندگان

Alessio Benavoli

Giorgio Corani

Francesca Mangili

چکیده

The statistical comparison of multiple algorithms over multiple data sets is fundamental in machine learning. This is typically carried out by the Friedman test. When the Friedman test rejects the null hypothesis, multiple comparisons are carried out to establish which are the significant differences among algorithms. The multiple comparisons are usually performed using the mean-ranks test. The aim of this technical note is to discuss the inconsistencies of the mean-ranks post-hoc test with the goal of discouraging its use in machine learning as well as in medicine, psychology, etc.. We show that the outcome of the mean-ranks test depends on the pool of algorithms originally included in the experiment. In other words, the outcome of the comparison between algorithms A and B depends also on the performance of the other algorithms included in the original experiment. This can lead to paradoxical situations. For instance the difference between A and B could be declared significant if the pool comprises algorithms C,D,E and not significant if the pool comprises algorithms F,G,H . To overcome these issues, we suggest instead to perform the multiple comparison using a test whose outcome only depends on the two algorithms being compared, such as the sign-test or the Wilcoxon signed-rank test.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Comparisons of Classifiers over Multiple Data Sets

While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines se...

متن کامل

The effect of functional fatigue on dynamic and static balance in boy students with different plantar arch

The aim of this study was to investigate the effect of functional fatigue on the dynamic and static balance of male students aged 15 to 18 with different plantar arch. To measure the subjects' foot arch, the Navicular Drop Test was used. The subjects were then randomly divided into three groups of 22 members with different foot arches. Later on, SEBT test, Modified Stork Balance, Fatigue Protoc...

متن کامل

Factors Effective on Drug Abuse from the Male Prisoners Point of View: Case Study of One of the Southeastern Prisons in Iran

Background: Identifying the factors effective on the tendency to substance use from the viewpoint of high-risk groups such as prisoners is essential for planning to control and prevent substance use. The purpose of this study was to determine the factors related to substance use tendency from the prisoners' point of view. Methods: This descriptive and analytic cross-sectional study was perfo...

متن کامل

Comparing the Effectiveness of Mindfulness and Aerobic Exercise on Psychological Factors and Sleep Quality Following Recovery from COVID-19

Background and purpose: The COVID-19 pandemic have led to some psychological disorders and sleep problems that should be taken into account after recovery. After recovering from COVID-19 people are at risk of sleep disorders, depression, and low quality of life and there is paucity of information about this issue. The present study aimed to compare the effectiveness of mindfulness and aerobic ...

متن کامل

Post Hoc Power: Tables and Commentary

Post hoc power is the retrospective power of an observed effect based on the sample size and parameter estimates derived from a given data set. Many scientists recommend using post hoc power as a follow-up analysis, especially if a finding is nonsignificant. This article presents tables of post hoc power for common t and F tests. These tables make it explicitly clear that for a given significan...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Journal of Machine Learning Research

دوره 17 شماره

صفحات -

تاریخ انتشار 2016

Should We Really Use Post-Hoc Tests Based on Mean-Ranks?

نویسندگان

چکیده

منابع مشابه

Statistical Comparisons of Classifiers over Multiple Data Sets

The effect of functional fatigue on dynamic and static balance in boy students with different plantar arch

Factors Effective on Drug Abuse from the Male Prisoners Point of View: Case Study of One of the Southeastern Prisons in Iran

Comparing the Effectiveness of Mindfulness and Aerobic Exercise on Psychological Factors and Sleep Quality Following Recovery from COVID-19

Post Hoc Power: Tables and Commentary

عنوان ژورنال:

اشتراک گذاری