Tuning Data Mining Methods for Cost-Sensitive Regression: A Study in Loan Charge-Off Forecasting

نویسندگان

  • Gaurav Bansal
  • Atish P. Sinha
  • Huimin Zhao
چکیده

real-world predictive data mining (classification or regression) problems are often cost sensitive, meaning that different types of prediction errors are not equally costly. While cost-sensitive learning methods for classification problems have been extensively studied recently, cost-sensitive regression has not been adequately addressed in the data mining literature yet. In this paper, we first advocate the use of average misprediction cost as a measure for assessing the performance of a cost-sensitive regression model. We then propose an efficient algorithm for tuning a regression model to further reduce its average misprediction cost. In contrast with previous statistical 316 BaNSal, SINha, aND ZhaO methods, which are tailored to particular cost functions, this algorithm can deal with any convex cost functions without modifying the underlying regression methods. We have evaluated the algorithm in bank loan charge-off forecasting, where underforecasting is considered much more costly than overforecasting. Our results show that the proposed algorithm significantly reduces the average misprediction costs of models learned with various base regression methods, such as linear regression, model tree, and neural network. the amount of cost reduction increases as the difference between the unit costs of the two types of errors (overprediction and underprediction) increases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost-sensitive Global Model Trees applied to loan charge-off forecasting

a r t i c l e i n f o Keywords: Cost-sensitive regression Model trees Evolutionary algorithms Asymmetric costs Loan charge-off forecasting Regression learning methods in real world applications often require cost minimization instead of the reduction of various metrics of prediction errors. Currently in the literature, there is a lack of white box solutions that can deal with forecasting proble...

متن کامل

Predicting personal credit ratings using ubiquitous data mining

Ubiquitous data mining (UDM) is a methodology for creating new knowledge by building an integrated financial database in a ubiquitous computing environment, extracting useful rules by using diverse rule-extraction-based data mining techniques, and combining these rules. In this study, we built six credit rating forecasting models using traditional statistical methods (i.e., logistic regression ...

متن کامل

Town trip forecasting based on data mining techniques

In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...

متن کامل

Project Time and Cost Forecasting using Monte Carlo simulation and Artificial Neural Networks

The aim of this study is to present a new method to predict project time and cost under uncertainty. Assuming that what happens in projects implementation which is expressed in the form of Earned Value Management (EVM) indicators is primarily related to the nature of randomness or unreliability, in this study, by using Monte Carlo simulation, and assuming a specific distribution for the time an...

متن کامل

Credit Card Fraud Detection using Data mining and Statistical Methods

Due to today’s advancement in technology and businesses, fraud detection has become a critical component of financial transactions. Considering vast amounts of data in large datasets, it becomes more difficult to detect fraud transactions manually. In this research, we propose a combined method using both data mining and statistical tasks, utilizing feature selection, resampling and cost-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. of Management Information Systems

دوره 25  شماره 

صفحات  -

تاریخ انتشار 2009