Fast Eective Rule Induction
نویسندگان
چکیده
Many existing rule learning systems are computationally expensive on large noisy datasets In this paper we evaluate the recently proposed rule learning algorithm IREP on a large and diverse collection of benchmark problems We show that while IREP is extremely e cient it frequently gives error rates higher than those of C and C rules We then propose a num ber of modi cations resulting in an algo rithm RIPPERk that is very competitive with C rules with respect to error rates but much more e cient on large samples RIPPERk obtains error rates lower than or equivalent to C rules on of bench mark problems scales nearly linearly with the number of training examples and can e ciently process noisy datasets containing hundreds of thousands of examples
منابع مشابه
Sux Array 9=@.%"%k%4%j%:%'$nhf3s Sux Array $,$"$k!#$3$l$oj8;zns$na4$f$n@\hx<-$n%]%$%s%?$r<-=q=g$k3jg<$7$?g[ns$g!" Comparison among Sux Array Construction Algorithms
Sux array is a compact data structure for searching matched strings from text databases. It is an array of pointers and stores all suxes of a text in lexicographic order. Because its memory requirement is less than tree structures, it is eective for large databases. Moreover, constructing the sux array is used in the Block Sorting compression scheme. We compare algorithms for constructing sux a...
متن کاملAlgorithms for Segmenting Time Series
As with most computer science problems, representation of the data is the key to ecient and eective solutions. Piecewise linear representation has been used for the representation of the data. This representation has been used by various researchers to support clustering, classication, indexing and association rule mining of time series data. A variety of algorithms have been proposed to obtain...
متن کاملA Margin-based Model with a Fast Local Searchnewline for Rule Weighting and Reduction in Fuzzynewline Rule-based Classification Systems
Fuzzy Rule-Based Classification Systems (FRBCS) are highly investigated by researchers due to their noise-stability and interpretability. Unfortunately, generating a rule-base which is sufficiently both accurate and interpretable, is a hard process. Rule weighting is one of the approaches to improve the accuracy of a pre-generated rule-base without modifying the original rules. Most of the pro...
متن کاملA decision-tree-based symbolic rule induction system for text categorization
We present a decision-tree-based symbolic rule induction system for categorizing text documents automatically. Our method for rule induction involves the novel combination of (1) a fast decision tree induction algorithm especially suited to text data and (2) a new method for converting a decision tree to a rule set that is simplified, but still logically equivalent to, the original tree. We rep...
متن کاملHandling Time Changing Data with Adaptive Very Fast Decision Rules
Data streams are usually characterized by changes in the underlying distribution generating data. Therefore algorithms designed to work with data streams should be able to detect changes and quickly adapt the decision model. Rules are one of the most interpretable and flexible models for data mining prediction tasks. In this paper we present the Adaptive Very Fast Decision Rules (AVFDR), an on-...
متن کامل