A Comparison of Four Approaches to Discretization Based on Entropy

نویسندگان

  • Jerzy W. Grzymala-Busse
  • Teresa Mroczek
چکیده

We compare four discretization methods, all based on entropy: the original C4.5 approach to discretization, two globalized methods, known as equal interval width and equal frequency per interval, and a relatively new method for discretization called multiple scanning using the C4.5 decision tree generation system. The main objective of our research is to compare the quality of these four methods using two criteria: an error rate evaluated by ten-fold cross-validation and the size of the decision tree generated by C4.5. Our results show that multiple scanning is the best discretization method in terms of the error rate and that decision trees generated from datasets discretized by multiple scanning are simpler than decision trees generated directly by C4.5 or generated from datasets discretized by both globalized discretization methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error-Based and Entropy-Based Discretization of Continuous Features

We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion. We present a discretization method based on the C4.5 decision tree algorithm and compare it to an existing entropy-based ...

متن کامل

An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches

Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...

متن کامل

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset

Background: In this paper, a generic hesitant fuzzy set (HFS) model for clustering various ECG beats according to weights of attributes is proposed. A comprehensive review of the electrocardiogram signal classification and segmentation methodologies indicates that algorithms which are able to effectively handle the nonstationary and uncertainty of the signals should be used for ECG analysis. Ex...

متن کامل

An Updated Review of Goodness of Fit Tests Based on Entropy

Different approaches to goodness of fit (GOF) testing are proposed. This survey intends to present the developments on Goodness of Fit based on entropy during the last 50 years, from the very first origins until the most recent advances for different data and models. Goodness of fit tests based on Shannon entropy was started by Vasicek in 1976 and were continued by many authors. In this paper, ...

متن کامل

Determination of weight vector by using a pairwise comparison matrix based on DEA and Shannon entropy

The relation between the analytic hierarchy process (AHP) and data envelopment analysis (DEA) is a topic of interest to researchers in this branch of applied mathematics. In this paper, we propose a linear programming model that generates a weight (priority) vector from a pairwise comparison matrix. In this method, which is referred to as the E-DEAHP method, we consider each row of the pairwise...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Entropy

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2016