Association Rule Discovery with Unbalanced Class Distributions
نویسندگان
چکیده
There are many methods for finding association rules in very large data. However it is well known that most general association rule discovery methods find too many rules, which include a lot of uninteresting rules. Furthermore, the performances of many such algorithms deteriorate when the minimum support is low. They fail to find many interesting rules even when support is low, particularly in the case of significantly unbalanced classes. In this paper we present an algorithm which finds association rules based on a set of new interestingness criteria. The algorithm is applied to a real-world health data set and successfully identifies groups of patients with high risk of adverse reaction to certain drugs. A statistically guided method of selecting appropriate features has also been developed. Initial results have shown that the proposed algorithm can find interesting patterns from data sets with unbalanced class distributions without performance loss.
منابع مشابه
Classification approach based on association rules mining for unbalanced data
This paper deals with the supervised classification when the response variable is binary and its class distribution is unbalanced. In such situation, it is not possible to build a powerful classifier by using standard methods such as logistic regression, classification tree, discriminant analysis, etc. To overcome this shortcoming of these methods that provide classifiers with low sensibility, ...
متن کاملClassification Rule Learning with APRIORI-C
Mining of association rules became one of the strongest elds of data mining This paper presents a classi cation rule learning algo rithm APRIORI C upgrading APRIORI to dealing with classi cation problems decreasing its memory consumption and time complexity fur ther decreasing its time complexity by feature subset selection and im proving the understandability of results by rule post processing...
متن کاملRegional Association Rule Mining
This project [4] centers on regional association rule mining and scoping in spatial datasets. We introduces a methodology for mining spatial association rules and proposes new algorithms to determine the scope of a spatial association rule. We develop a reward-based region discovery framework that employs clustering to find interesting regions. The framework is applied to solve two distinct reg...
متن کاملClassification of Highly Unbalanced CYP450 Data of Drugs Using Cost Sensitive Machine Learning Techniques
In this paper, we study the classifications of unbalanced data sets of drugs. As an example we chose a data set of 2D6 inhibitors of cytochrome P450. The human cytochrome P450 2D6 isoform plays a key role in the metabolism of many drugs in the preclinical drug discovery process. We have collected a data set from annotated public data and calculated physicochemical properties with chemoinformati...
متن کاملFCP-Growth: Class Itemsets for Class Association Rules
Since the first work of (Liu, Hsu, & Ma 1998), various works show the good performance of associative classification (association based classification) in terms of error rate reduction. Association classification deals with the prediction of the class from association rules, known as class association rules or predictive association rules. A class association rule is a rule whose consequent mus...
متن کامل