Enhancing Unbalanced Data Classification with Cross-Validation and Extreme Gradient Boosting: A Comprehensive Analysis

نویسندگان

چکیده

As a novel and efficient ensemble learning algorithm, XGBoost has been widely applied due to its multiple advantages, but classification effect in cases of data imbalance is often not ideal. Aiming at this problem, efforts were made optimize the Cross Validation algorithm. The main idea combine cross validation on unbalanced for processing, then get final model based through training. At same time, optimal parameters are searched adjusted automatically optimization algorithms realize more accurate predictions. In testing phase, area under curve (AUC) used as an evaluation indicator compare analyze performance various sampling methods algorithm models. results analysis using AUC expected verify feasibility effectiveness proposed

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data

Accurately predicting customer churn using large scale time-series data is a common problem facing many business domains. The creation of model features across various time windows for training and testing can be particularly challenging due to temporal issues common to time-series data. In this paper, we will explore the application of extreme gradient boosting (XGBoost) on a customer dataset ...

متن کامل

Bioactive Molecule Prediction Using Extreme Gradient Boosting.

Following the explosive growth in chemical and biological data, the shift from traditional methods of drug discovery to computer-aided means has made data mining and machine learning methods integral parts of today's drug discovery process. In this paper, extreme gradient boosting (Xgboost), which is an ensemble of Classification and Regression Tree (CART) and a variant of the Gradient Boosting...

متن کامل

Enhancing instance-based classification with local density: a new algorithm for classifying unbalanced biomedical data

MOTIVATION Classification is an important data mining task in biomedicine. In particular, classification on biomedical data often claims the separation of pathological and healthy samples with highest discriminatory performance for diagnostic issues. Even more important than the overall accuracy is the balance of a classifier, particularly if datasets of unbalanced class size are examined. RE...

متن کامل

Genetic Programming for Classification with Unbalanced Data

In classification, machine learning algorithms can suffer a performance bias when data sets are unbalanced. Binary data sets are unbalanced when one class is represented by only a small number of training examples (called the minority class), while the other class makes up the rest (majority class). In this scenario, the induced classifiers typically have high accuracy on the majority class but...

متن کامل

Gradient Boosting on Stochastic Data

where RA(T ) is the regret of A and is o(T ). To prove Proposition 4.3, we only need to show that Eqn. 5 holds for some γ ∈ (0, 1]. This is equivalent to showing that there exist a hypothesis h̃ ∈ H (‖h̃‖ = 1), such that 〈h̃, f∗〉 > 0. To see this equivalence, let us assume that 〈h̃, f∗/‖f∗‖〉 = > 0. Let us set h∗ = ‖f∗‖h̃. Using Pythagorean theorem, we can see that ‖h∗ − f∗‖2 = (1− 2)‖f∗‖2. Hence we ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: JITE (Journal of Informatics and Telecommunication Engineering)

سال: 2023

ISSN: ['2549-6247', '2549-6255']

DOI: https://doi.org/10.31289/jite.v7i1.8690