Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set
نویسندگان
چکیده
This paper investigates the implications of example weighting in subgroup discovery by comparing three state-of-the-art subgroup discovery algorithms, APRIORI-SD, CN2-SD, and SubgroupMiner on a real-life data set. While both APRIORI-SD and CN2-SD use example weighting in the process of subgroup discovery, SubgroupMiner does not. Moreover, APRIORI-SD uses example weighting in the post-processing step of selecting the ‘best’ rules, while CN2-SD uses example weighting during rule induction. The results of the application of the three subgroup discovery algorithms on a real-life data set – the UK Traffic challenge data set are presented in the form of ROC curves showing that APRIORISD slightly outperforms CN2-SD; both APRIORI-SD and CN2-SD are good in finding small and highly accurate subgroups (describing minority classes), while SubgroupMiner found larger and less accurate subgroups (describing the majority class). We show by using ROC analysis that these results are not surprising and can be attributed to example weighting, that ‘pushes’ the search in the space of potential subgroups towards discovering small and accurate subgroups.
منابع مشابه
ROC Analysis of Example Weighting in Subgroup Discovery
This paper presents two new ways of example weighting for subgroup discovery. The proposed example weighting schemes are applicable to any subgroup discovery algorithm that uses the weighted covering approach to discover interesting subgroups in data. To show the implications that the new example weighting schemes have on subgroup discovery, they were implemented in the APRIORI-SD algorithm. RO...
متن کاملA new weighting approach to Non-Parametric composite indices compared with principal components analysis
Introduction of Human Development Index (HDI) by UNDP in early 1990 followed a surge in use of non-parametric and parametric indices for measurement and comparison of countries performance in development, globalization, competition, well-being and etc. The HDI is a composite index of three indicators. Its components are to reflect three major dimensions of human development: longevity, knowledg...
متن کاملUsing Subgroup Discovery to Analyze the UK Traffic Data
Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the mo...
متن کاملخوشهبندی خودکار دادههای مختلط با استفاده از الگوریتم ژنتیک
In the real world clustering problems, it is often encountered to perform cluster analysis on data sets with mixed numeric and categorical values. However, most existing clustering algorithms are only efficient for the numeric data rather than the mixed data set. In addition, traditional methods, for example, the K-means algorithm, usually ask the user to provide the number of clusters. In this...
متن کاملRanking DMUs by ideal points in the presence of fuzzy and ordinal data
Envelopment Analysis (DEA) is a very eective method to evaluate the relative eciency of decision-making units (DMUs). DEA models divided all DMUs in two categories: ecient and inecientDMUs, and don't able to discriminant between ecient DMUs. On the other hand, the observedvalues of the input and output data in real-life problems are sometimes imprecise or vague, suchas interval data, ordinal da...
متن کامل