AdaCC: cumulative cost-sensitive boosting for imbalanced classification

نویسندگان

چکیده

Abstract Class imbalance poses a major challenge for machine learning as most supervised models might exhibit bias towards the majority class and under-perform in minority class. Cost-sensitive tackles this problem by treating classes differently, formulated typically via user-defined fixed misclassification cost matrix provided input to learner. Such parameter tuning is challenging task that requires domain knowledge moreover, wrong adjustments lead overall predictive performance deterioration. In work, we propose novel cost-sensitive boosting approach imbalanced data dynamically adjusts costs over rounds response model’s instead of using matrix. Our method, called AdaCC, parameter-free it relies on cumulative behavior model order adjust next round comes with theoretical guarantees regarding training error. Experiments 27 real-world datasets from different domains high demonstrate superiority our method 12 state-of-the-art approaches exhibiting consistent improvements measures, instance, range [0.3–28.56%] AUC, [3.4–21.4%] balanced accuracy, [4.8–45%] gmean [7.4–85.5%] recall.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cost-Sensitive Boosting for Classification of Imbalanced Data

The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in mach...

متن کامل

Cost-sensitive decision tree ensembles for effective imbalanced classification

Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...

متن کامل

Boosting Cost-Sensitive Trees

This paper explores two techniques for boosting cost-sensitive trees. The two techniques diier in whether the misclassiication cost information is utilized during training. We demonstrate that each of these techniques is good at diierent aspects of cost-sensitive classiications. We also show that both techniques provide a means to overcome the weaknesses of their base cost-sensitive tree induct...

متن کامل

CUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification

Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...

متن کامل

Boosting Trees for Cost-Sensitive Classifications

This paper explores two boosting techniques for cost-sensitive tree classiications in the situation where misclassiication costs change very often. Ideally, one would like to have only one induction, and use the induced model for diierent misclassiication costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Knowledge and Information Systems

سال: 2022

ISSN: ['0219-3116', '0219-1377']

DOI: https://doi.org/10.1007/s10115-022-01780-8