Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set

نویسندگان

  • Branko Kavšek
  • Nada Lavrač
چکیده

This paper investigates the implications of example weighting in subgroup discovery by comparing three state-of-the-art subgroup discovery algorithms, APRIORI-SD, CN2-SD, and SubgroupMiner on a real-life data set. While both APRIORI-SD and CN2-SD use example weighting in the process of subgroup discovery, SubgroupMiner does not. Moreover, APRIORI-SD uses example weighting in the post-processing step of selecting the ‘best’ rules, while CN2-SD uses example weighting during rule induction. The results of the application of the three subgroup discovery algorithms on a real-life data set – the UK Traffic challenge data set are presented in the form of ROC curves showing that APRIORISD slightly outperforms CN2-SD; both APRIORI-SD and CN2-SD are good in finding small and highly accurate subgroups (describing minority classes), while SubgroupMiner found larger and less accurate subgroups (describing the majority class). We show by using ROC analysis that these results are not surprising and can be attributed to example weighting, that ‘pushes’ the search in the space of potential subgroups towards discovering small and accurate subgroups.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ROC Analysis of Example Weighting in Subgroup Discovery

This paper presents two new ways of example weighting for subgroup discovery. The proposed example weighting schemes are applicable to any subgroup discovery algorithm that uses the weighted covering approach to discover interesting subgroups in data. To show the implications that the new example weighting schemes have on subgroup discovery, they were implemented in the APRIORI-SD algorithm. RO...

متن کامل

A ‎n‎ew weighting approach to Non-Parametric composite indices compared with principal components analysis‎

Introduction of Human Development Index (HDI) by UNDP in early 1990 followed a surge in use of non-parametric and parametric indices for measurement and comparison of countries performance in development, globalization, competition, well-being and etc. The HDI is a composite index of three indicators. Its components are to reflect three major dimensions of human development: longevity, knowledg...

متن کامل

Using Subgroup Discovery to Analyze the UK Traffic Data

Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the mo...

متن کامل

خوشه‌بندی خودکار داده‌های مختلط با استفاده از الگوریتم ژنتیک

In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...

متن کامل

Ranking DMUs by ideal points in the presence of fuzzy and ordinal data

Envelopment Analysis (DEA) is a very eective method to evaluate the relative eciency of decision-making units (DMUs). DEA models divided all DMUs in two categories: ecient and inecientDMUs, and don't able to discriminant between ecient DMUs. On the other hand, the observedvalues of the input and output data in real-life problems are sometimes imprecise or vague, suchas interval data, ordinal da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004