Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data

نویسندگان

چکیده

Training an imbalanced dataset can cause classifiers to overfit the majority class and increase possibility of information loss for minority class. Moreover, accuracy may not give a clear picture classifier’s performance. This paper utilized decision tree (DT), support vector machine (SVM), artificial neural networks (ANN), K-nearest neighbors (KNN) Naïve Bayes (NB) besides ensemble models like random forest (RF) gradient boosting (GB), which use bagging methods, three sampling approaches seven performance metrics investigate effect imbalance on water quality data. Based results, best model was without resampling almost all except balanced accuracy, sensitivity area under curve (AUC), followed by in term specificity, precision AUC. However, sensitivity, highest achieved with under-sampling dataset. Focusing each metric separately, results showed that specificity precision, it is better preprocess classifiers. Nevertheless, improvement both when using resampled

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neighbourhood sampling in bagging for imbalanced data

Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex...

متن کامل

Hybrid probabilistic sampling with random subspace for imbalanced data learning

Class imbalance is one of the challenging problems for machine learning in many real-world applications. Other issues, such as within-class imbalance and high dimensionality, can exacerbate the problem. We propose a method HPSDRS that combines two ideas: Hybrid Probabilistic Sampling technique ensemble with Diverse Random Subspace to address these issues. HPS improves the performance of traditi...

متن کامل

Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD

Classification plays a critical role in false positive reduction (FPR) in lung nodule computer aided detection (CAD). The difficulty of FPR lies in the variation of the appearances of the nodules, and the imbalance distribution between the nodule and non-nodule class. Moreover, the presence of inherent complex structures in data distribution, such as within-class imbalance and high-dimensionali...

متن کامل

A Novel Ensemble Method for Imbalanced Data Learning: Bagging of Extrapolation-SMOTE SVM

Class imbalance ubiquitously exists in real life, which has attracted much interest from various domains. Direct learning from imbalanced dataset may pose unsatisfying results overfocusing on the accuracy of identification and deriving a suboptimal model. Various methodologies have been developed in tackling this problem including sampling, cost-sensitive, and other hybrid ones. However, the sa...

متن کامل

Building Useful Models from Imbalanced Data with Sampling and Boosting

Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classification algorithms are applied. These algorithms often attempt to build models with the goal of maximizing overall classification accuracy. While such a model may be very accurate, it is often not very useful. Consider the d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Indonesian Journal of Electrical Engineering and Computer Science

سال: 2022

ISSN: ['2502-4752', '2502-4760']

DOI: https://doi.org/10.11591/ijeecs.v29.i1.pp598-608