On Exploring Soft Discretization of Continuous Attributes
نویسنده
چکیده
Searching for a binary partition of attribute domains is an important task in data mining. It is present in both decision tree construction and discretization. The most important advantages of decision tree methods are compactness and clearness of knowledge representation as well as high accuracy of classification. Decision tree algorithms also have some drawbacks. In cases of large data tables, existing decision tree induction methods are often inefficient in both computation and description aspects. Another disadvantage of standard decision tree methods is their instability, i.e., small data deviations may require a significant reconstruction of the decision tree. We present novel soft discretization methods using soft cuts instead of traditional crisp (or sharp) cuts. This new concept makes it possible to generate more compact and stable decision trees with high accuracy of classification. We also present an efficient method for soft cut generation from large databases.
منابع مشابه
OFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification
The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean continuous values. In this paper, we propose a method for discretizing continuous attributes by means of a series of fuzzy sets, which constitutes a f...
متن کاملUtilizing multiple pheromones in an ant-based algorithm for continuous-attribute classification rule discovery
The cAnt-Miner algorithm is an Ant Colony Optimization (ACO) based technique for classification rule discovery in problem domains which include continuous attributes. In this paper, we propose several extensions to cAntMiner. The main extension is based on the use of multiple pheromone types, one for each class value to be predicted. In the proposed μcAnt-Miner algorithm, an ant first selects a...
متن کاملDynamic Discretization of Continuous Attributes
Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees, on the other hand, require sorting operations to deal with continuous attributes , which largely increase learning times. This paper presents a new method of discretization, whose main char...
متن کاملSoft Discretization to Enhance the Continuous Decision Tree Induction*
Decision tree induction has been widely used to generate classifiers from training data through a process of recursively splitting the data space. In the case of training on continuous-valued data, the associated attributes must be discretized in advance or during the learning process. The commonly used method is to partition the attribute range into two or several intervals using a single or a...
متن کاملDynamic discreduction using Rough Sets
Discretization of continuous attributes is a necessary pre-requisite in deriving association rules and discovery of knowledge from databases. The derived rules are simpler and intuitively more meaningful if only a small number of attributes are used, and each attribute is discretized into a few intervals. The present research paper explores the interrelation between discretization and reduction...
متن کامل