imbalanced data sampling

Genetic Algorithm Based Over-Sampling with DNN in Classifying the Imbalanced Data Distribution Problem

Journal: :Indian journal of science and technology 2023

Objective: Data imbalance exists in many real-life applications. In the imbalanced datasets, minority class data creates a wrong inference during classification that leads to more misclassification. More research has been done past solve this issue, but as of now there is no global working solution found do efficient classification. After analyzing various existing literatures, it proposed mini...

متن کامل

Kernel Based Asymmetric Learning for Software Defect Prediction

Journal: :IEICE Transactions 2012

Ying Ma Guangchun Luo Hao Chen

Software defect prediction is to predict the defect-prone modules for the next release of software or cross project software. Real world data mining applications, including software defect prediction domain, must address the issue of learning from imbalanced data sets. As pointed out by Khoshgoftaar et al. [1] and Menzies et al. [2], the majority of defects in a software system are located in a...

متن کامل

Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown

2003

Marcus A. Maloof

The problem of learning from imbalanced data sets, while not the same problem as learning when misclassification costs are unequal and unknown, can be handled in a similar manner. That is, in both contexts, we can use techniques from roc analysis to help with classifier design. We present results from two studies in which we dealt with skewed data sets and unequal, but unknown costs of error. W...

متن کامل

The Classification of Imbalanced Spatial Data

2011

Alina Lazar Bradley Shellito

This paper describes a method of improving the prediction of urbanization. The four datasets used in this study were extracted using Geographical Information Systems (GIS). Each dataset contains seven independent variables related to urban development and a class label which denotes the urban areas versus the rural areas. Two classification methods Support Vector Machines (SVM) and Neural Netwo...

متن کامل

A New approach for Classification of Highly Imbalanced Datasets using Evolutionary Algorithms

2011

Satyam Maheshwari Sanjeev Sharma

Today’s most of the research interest is in the application of evolutionary algorithms. One of the examples is classification rules in imbalanced domains. The problem of Imbalanced data sets plays a major challenge in data mining community. In imbalanced data sets, the number of instances of one class is much higher than the others, and the class of fewer representatives is of more interest fro...

متن کامل

Text Sampling and Re-Sampling for Imbalanced Authorship Identification Cases

2006

Efstathios Stamatatos

Authorship identification can be seen as a single-label multi-class text categorization problem. Very often, there are extremely few training texts at least for some of the candidate authors. In this paper, we present methods to handle imbalanced multi-class textual datasets. The main idea is to segment the training texts into sub-samples according to the size of the class. Hence, minority clas...

متن کامل

An Evaluation of Sampling on Filter-Based Feature Selection Methods

2010

Kehan Gao Taghi M. Khoshgoftaar Jason Van Hulse

Feature selection and data sampling are two of the most important data preprocessing activities in the practice of data mining. Feature selection is used to remove less important features from the training data set, while data sampling is an effective means for dealing with the class imbalance problem. While the impacts of feature selection and class imbalance have been frequently investigated ...

متن کامل

INDUCING VALUABLE RULES FROM IMBALANCED DATA: THE CASE OF AN IRANIAN BANK EXPORT LOANS

Journal: International Journal of Information, Security and Systems Management 2013

<span style="color: #000000; font-family: Tahoma, sans-serif; font-size: 13px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: -webkit-left; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; display: inline !important; float: none; ba...

متن کامل

SMOTE: Synthetic Minority Over-sampling Technique

Journal: :J. Artif. Intell. Res. 2002

Kevin W. Bowyer Nitesh V. Chawla Lawrence O. Hall W. Philip Kegelmeyer

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of “normal” examples with only a small percentage of “abnormal” or “interesting” examples. It is also the case that the cost of misclassifying an abnormal (i...

متن کامل

Imbalanced Datasets: from Sampling to Classifiers

2013

T. Ryan Hoens Nitesh V. Chawla

Classification is one of the most fundamental tasks in the machine learning and data-mining communities. One of the most common challenges faced when trying to perform classification is the class imbalance problem. A dataset is considered imbalanced if the class of interest (positive or minority class) is relatively rare as compared to the other classes (negative or majority classes). As a resu...

متن کامل