Accuracy vs. Comprehensibility in Data Mining Models
نویسندگان
چکیده
This paper addresses the important issue of the tradeoff between accuracy and comprehensibility in data mining. The paper presents results which show that it is, to some extent, possible to bridge this gap. A method for rule extraction from opaque models (Genetic Rule EXtraction – G-REX) is used to show the effects on accuracy when forcing the creation of comprehensible representations. In addition the technique of combining different classifiers to an ensemble is demonstrated on some well-known data sets. The results show that ensembles generally have very high accuracy, thus making them a good first choice when performing predictive data mining.
منابع مشابه
Building comprehensible customer churn prediction models with advanced rule induction techniques
Customer churn prediction models aim to detect customers with a high propensity to attrite. Predictive accuracy, comprehensibility, and justifiability are three key aspects of a churn prediction model. An accurate model permits to correctly target future churners in a retention marketing campaign, while a comprehensible and intuitive rule-set allows to identify the main drivers for customers to...
متن کاملThe Truth is In There - Rule Extraction from Opaque Models Using Genetic Programming
A common problem when using complicated models for prediction and classification is that the complexity of the model entails that it is hard, or impossible, to interpret. For some scenarios this might not be a limitation, since the priority is the accuracy of the model. In other situations the limitations might be severe, since additional aspects are important to consider; e.g. comprehensibilit...
متن کاملWhy Not Use an Oracle When You Got One?
The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques like neural networks or ensembles, resulting in opaque models that are hard or impossible to interpret. For some domains this is unacceptable, since models need to be comprehensible. To achieve comprehensibility, accuracy i...
متن کاملTuve Löfström & Ulf Johansson
When performing data mining, the selection of data mining technique is a critical decision. Often this choice boils down to whether a transparent model is needed or not. Most research indicates that techniques producing transparent models, such as decision trees, often have an inferior accuracy compared to techniques such as neural networks. On the other hand, models created by neural networks ...
متن کاملA Novel Genetic Algorithm Based Method for Building Accurate and Comprehensible Churn Prediction Models
Customer churn has become a critical problem for all companies in particular for those that are operating in service-based industries such as telecommunication industry. Data mining techniques have been used for constructing churn prediction models. Past research in churn prediction context have mainly focused on the accuracy aspect of the constructed churn models. However, in addition to the a...
متن کامل