AdaCC: cumulative cost-sensitive boosting for imbalanced classification
نویسندگان
چکیده
Abstract Class imbalance poses a major challenge for machine learning as most supervised models might exhibit bias towards the majority class and under-perform in minority class. Cost-sensitive tackles this problem by treating classes differently, formulated typically via user-defined fixed misclassification cost matrix provided input to learner. Such parameter tuning is challenging task that requires domain knowledge moreover, wrong adjustments lead overall predictive performance deterioration. In work, we propose novel cost-sensitive boosting approach imbalanced data dynamically adjusts costs over rounds response model’s instead of using matrix. Our method, called AdaCC, parameter-free it relies on cumulative behavior model order adjust next round comes with theoretical guarantees regarding training error. Experiments 27 real-world datasets from different domains high demonstrate superiority our method 12 state-of-the-art approaches exhibiting consistent improvements measures, instance, range [0.3–28.56%] AUC, [3.4–21.4%] balanced accuracy, [4.8–45%] gmean [7.4–85.5%] recall.
منابع مشابه
Cost-Sensitive Boosting for Classification of Imbalanced Data
The classification of data with imbalanced class distributions has posed a significant drawback in the performance attainable by most well-developed classification systems, which assume relatively balanced class distributions. This problem is especially crucial in many application domains, such as medical diagnosis, fraud detection, network intrusion, etc., which are of great importance in mach...
متن کاملCost-sensitive decision tree ensembles for effective imbalanced classification
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...
متن کاملBoosting Cost-Sensitive Trees
This paper explores two techniques for boosting cost-sensitive trees. The two techniques diier in whether the misclassiication cost information is utilized during training. We demonstrate that each of these techniques is good at diierent aspects of cost-sensitive classiications. We also show that both techniques provide a means to overcome the weaknesses of their base cost-sensitive tree induct...
متن کاملCUSBoost: Cluster-based Under-sampling with Boosting for Imbalanced Classification
Class imbalance classification is a challenging research problem in data mining and machine learning, as most of the real-life datasets are often imbalanced in nature. Existing learning algorithms maximise the classification accuracy by correctly classifying the majority class, but misclassify the minority class. However, the minority class instances are representing the concept with greater in...
متن کاملBoosting Trees for Cost-Sensitive Classifications
This paper explores two boosting techniques for cost-sensitive tree classiications in the situation where misclassiication costs change very often. Ideally, one would like to have only one induction, and use the induced model for diierent misclassiication costs. Thus, it demands robustness of the induced model against cost changes. Combining multiple trees gives robust predictions against this ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Knowledge and Information Systems
سال: 2022
ISSN: ['0219-3116', '0219-1377']
DOI: https://doi.org/10.1007/s10115-022-01780-8