smote

Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE

Journal: :CoRR 2017

Georgios Douzas Fernando Baçao

Classification of imbalanced datasets is a challenging task for standard algorithms. Although many methods exist to address this problem in different ways, generating artificial data for the minority class is a more general approach compared to algorithmic modifications. SMOTE algorithm and its variations generate synthetic samples along a line segment that joins minority class instances. In th...

متن کامل

Preprocessing noisy imbalanced datasets using SMOTE enhanced with fuzzy rough prototype selection

Journal: :Appl. Soft Comput. 2014

Nele Verbiest Enislay Ramentol Chris Cornelis Francisco Herrera

The Synthetic Minority Over Sampling TEchnique (SMOTE) is a widely used technique to balance imbalanced data. In this paper we focus on improving SMOTE in the presence of class noise. Many improvements of SMOTE have been proposed, mostly cleaning or improving the data after applying SMOTE. Our approach differs from these approaches by the fact that it cleans the data before applying SMOTE, such...

متن کامل

Oversampling Method for Imbalanced Classification

Journal: :Computing and Informatics 2015

Zhuoyuan Zheng Yunpeng Cai Ye Li

Classification problem for imbalanced datasets is pervasive in a lot of data mining domains. Imbalanced classification has been a hot topic in the academic community. From data level to algorithm level, a lot of solutions have been proposed to tackle the problems resulted from imbalanced datasets. SMOTE is the most popular data-level method and a lot of derivations based on it are developed to ...

متن کامل

Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem

2009

Chumphol Bunkhumpornpat Krung Sinapiromsaran Chidchanok Lursinsap

The class imbalanced problem occurs in various disciplines when one of target classes has a tiny number of instances comparing to other classes. A typical classifier normally ignores or neglects to detect a minority class due to the small number of class instances. SMOTE is one of over-sampling techniques that remedies this situation. It generates minority instances within the overlapping regio...

متن کامل

SMOTE for Learning from Imbalanced Data: Progress and Challenges. Marking the 15-year Anniversary∗

2018

Alberto Fernández Francisco Herrera Nitesh V. Chawla

The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm has been established as a “de facto” standard in the framework of learning from imbalanced data. This is due to its simplicity in the design of the procedure, as well as its robustness when applied to different type of problems. Since its publication in 2002, it has proven successful in a number of different applicati...

متن کامل

SMOTE-IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering

Journal: :Inf. Sci. 2015

José A. Sáez Julián Luengo Jerzy Stefanowski Francisco Herrera

Classification datasets often have an unequal class distribution among their examples. This problem is known as imbalanced classification. The Synthetic Minority Over-sampling Technique (SMOTE) is one of the most well-know data pre-processing methods to cope with it and tobalance thedifferentnumberof examples of eachclass.However, as recentworks claim, class imbalance is not a problem in itself...

متن کامل

Improving SMOTE with Fuzzy Rough Prototype Selection to Detect Noise in Imbalanced Classification Data

2012

Nele Verbiest Enislay Ramentol Chris Cornelis Francisco Herrera

In this paper, we present a prototype selection technique for imbalanced data, Fuzzy Rough Imbalanced Prototype Selection (FRIPS), to improve the quality of the artificial instances generated by the Synthetic Minority Over-sampling TEchnique (SMOTE). Using fuzzy rough set theory, the noise level of each instance is measured, and instances for which the noise level exceeds a certain threshold le...

متن کامل

A New Over-sample Method Based on Distribution Density

Journal: :JCP 2014

Kuoyi Shao Yun Zhai Haifeng Sui Changsheng Zhang Nan Ma

A new method was proposed for leaning from the imbalanced dataset based the samples distribution density in this paper. In the proposed scheme, a model of samples distribution density was designed, followed by the improved smote progress SDD-SMOTE where we smoted the minority samples according to the samples distribution density. Cross-validation results show that proposed SDD-SMOTE method to s...

متن کامل

Managing Borderline and Noisy Examples in Imbalanced Classification by Combining SMOTE with Ensemble Filtering

2014

José A. Sáez Julián Luengo Jerzy Stefanowski Francisco Herrera

Imbalance data constitutes a great difficulty for most algorithms learning classifiers. However, as recent works claim, class imbalance is not a problem in itself and performance degradation is also associated with other factors related to the distribution of the data as the presence of noisy and borderline examples in the areas surrounding class boundaries. This contribution proposes to extend...

متن کامل

A novel over-sampling method and its application to miRNA prediction

2013

Xuan Tho Dang Osamu Hirose Thammakorn Saethang Vu Anh Tran Lan Anh T. Nguyen Mamoru Kubo Yoichi Yamada Kenji Satou

MicroRNAs (miRNAs) are short (~22 nt) non-coding RNAs that play an indispensable role in gene regulation of many biological processes. Most of current computational, comparative, and non-comparative methods commonly classify human precursor microRNA (pre-miRNA) hairpins from both genome pseudo hairpins and other non-coding RNAs (ncRNAs). Although there were a few approaches achieving promising ...

متن کامل