Preceding Rule Induction with Instance Reduction Methods
نویسندگان
چکیده
A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy.
منابع مشابه
IRDDS: Instance reduction based on Distance-based decision surface
In instance-based learning, a training set is given to a classifier for classifying new instances. In practice, not all information in the training set is useful for classifiers. Therefore, it is convenient to discard irrelevant instances from the training set. This process is known as instance reduction, which is an important task for classifiers since through this process the time for classif...
متن کاملMining Soft-Matching Rules from Textual Data
Text mining concerns the discovery of knowledge from unstructured textual data. One important task is the discovery of rules that relate specific words and phrases. Although existing methods for this task learn traditional logical rules, soft-matching methods that utilize word-frequency information generally work better for textual data. This paper presents a rule induction system, TEXTRISE, th...
متن کاملNoise-Tolerant Rule induction from Multi-Instance data
This paper addresses the issue of multipleinstance induction of rules in the presence of noise. It first proposes a multiple-instance extensions of rule-based learning algorithms. Then, it shows what kind of noise can appear in multiple-instance data, and how to handle it theoretically. Finally, it describes the implementation of such a noise-tolerant multiple instance learner, and shows its pe...
متن کاملUnifying Instance - Based and Rule - Based Induction
Several well-developed approaches to inductive learning now exist, but each has speci c limitations that are hard to overcome. Multi-strategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a uni cation of two widely-used empirical approaches: rule induction and instance-based learning. In the new algorithm, instances are treated a...
متن کاملRule Induction by EDA with Instance-Subpopulations
In this paper, a new rule induction method by using EDA with instance-subpopulations is proposed. The proposed method introduces a notion of instance-subpopulation, where a set of individuals matching a training instance. Then, EDA procedure is separately carried out for each instance-subpopulation. Individuals generated by each EDA procedure are merged to constitute the population at the next ...
متن کامل