Global discretization of continuous attributes as preprocessing for machine learning
نویسندگان
چکیده
Real-life data usually are presented in databases by real numbers. On the other hand, most inductive learning methods require a small number of attribute values. Thus it is necessary to convert input data sets with continuous attributes into input data sets with discrete attributes. Methods of discretization restricted to single continuous attributes will be called local, while methods that simultaneously convert all continuous attributes will be called global. In this paper, a method of transforming any local discretization method into a global one is presented. A global discretization method, based on cluster analysis, is presented and compared experimentalty with three known local methods, transformed into global. Experiments include tenfold cross-validation and leaving-oneout methods for ten real-life data sets. © 1996 Elsevier Science Inc. K E Y W O R D S : discretization, quantization, continuous attributes, machine learning from examples, rough set theory
منابع مشابه
Discretization of Continuous Attributes in Supervised Learning algorithms
We propose a new algorithm, called CILA, for discretization of continuous attribute. The CILA algorithm can be used with any class labeled data. The tests performed using the CILA algorithm show that it generates discretization schemes with almost always the highest dependence between the class labels and the discrete intervals, and always with significantly lower number of intervals, when comp...
متن کاملMIDCA --- A Discretization Model for Data Preprocessing in Data Mining
Decision tree is one of the most widely used and practical methods in data mining and machine learning discipline. However, many discretization algorithms developed in this field focus on univariate only, which is inadequate to handle the critical problems especially owned by medical domain. In this paper, we propose a new multivariate discretization method called Multivariate Interdependent Di...
متن کاملUnsupervised Discretization Using Kernel Density Estimation
Discretization, defined as a set of cuts over domains of attributes, represents an important preprocessing task for numeric data analysis. Some Machine Learning algorithms require a discrete feature space but in real-world applications continuous attributes must be handled. To deal with this problem many supervised discretization methods have been proposed but little has been done to synthesize...
متن کاملMaking Better Use of Global Discretization
Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is essential. Even if it can, prior discretization often accelerates induction, and may produce simpler and more accurate classi ers. As it is generally done, global discretization denies the learning al...
متن کاملDynamic Discretization of Continuous Attributes
Discretization of continuous attributes is an important task for certain types of machine learning algorithms. Bayesian approaches, for instance, require assumptions about data distributions. Decision Trees, on the other hand, require sorting operations to deal with continuous attributes , which largely increase learning times. This paper presents a new method of discretization, whose main char...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Approx. Reasoning
دوره 15 شماره
صفحات -
تاریخ انتشار 1996