نتایج جستجو برای: imbalanced data sampling

تعداد نتایج: 2528204  

Journal: :CoRR 2013
Raman Singh Harish Kumar R. K. Singla

Network traffic data is huge, varying and imbalanced because various classes are not equally distributed. Machine learning (ML) algorithms for traffic analysis uses the samples from this data to recommend the actions to be taken by the network administrators as well as training. Due to imbalances in dataset, it is difficult to train machine learning algorithms for traffic analysis and these may...

2017
Shin Ando Chun-Yuan Huang

Class imbalance is a challenging issue in practical classification problems for deep learning models as well as traditional models. Traditionally successful countermeasures such as synthetic oversampling have had limited success with complex, structured data handled by deep learning models. In this paper, we propose Deep Over-sampling (DOS), a framework for extending the synthetic over-sampling...

2014
Ali Zughrat

Support Vector Machines (SVMs) is a popular machine learning technique, which has proven to be very effective in solving many classical problems with balanced data sets in various application areas. However, this technique is also said to perform poorly when it is applied to the problem of learning from heavily imbalanced data sets where the majority classes significantly outnumber the minority...

2008
Chris Seiffert Taghi M. Khoshgoftaar Jason Van Hulse Amri Napolitano

Building useful classification models can be a challenging endeavor, especially when training data is imbalanced. Class imbalance presents a problem when traditional classification algorithms are applied. These algorithms often attempt to build models with the goal of maximizing overall classification accuracy. While such a model may be very accurate, it is often not very useful. Consider the d...

Journal: :Pattern Recognition Letters 2015
Annarita D'Addabbo Rosalia Maglietta

Several applications aim to identify rare events from very large data sets. Classification algorithms may present great limitations on large data sets and show a performance degradation due to class imbalance. Many solutions have been presented in literature to deal with the problem of huge amount of data or imbalancing separately. In this paper we assessed the performances of a novel method, P...

2012
Madhuri Agrawal Gajendra Singh Ravindra Kumar Gupta

The paper addresses some theoretical and practical aspects of data mining, focusing on predictive data mining, where two central types of prediction problems are discussed: classification and regression. Further accent is made on predictive data mining, where the time-stamped data greatly increase the dimensions and complexity of problem solving. The main goal is through processing of data (rec...

Journal: :CoRR 2016
Maureen Lyndel C. Lauron Jaderick P. Pabico

This paper presents the performance of a classifier built using the stackingC algorithm in nine different data sets. Each data set is generated using a sampling technique applied on the original imbalanced data set. Five new sampling techniques are proposed in this paper (i.e., SMOTERandRep, Lax Random Oversampling, Lax Random Undersampling, Combined-Lax Random Oversampling Undersampling, and C...

Journal: :Neurocomputing 2014
Ming Gao Xia Hong Sheng Chen Christopher J. Harris Emad Khalaf

This contribution proposes a novel probability density function (PDF) estimation based over-sampling (PDFOS) approach for two-class imbalanced classification problems. The classical Parzen-window kernel function is adopted to estimate the PDF of the positive class. Then according to the estimated PDF, synthetic instances are generated as the additional training data. The essential concept is to...

2016
Alaa Tharwat Yasmine S. Moemen Aboul Ella Hassanien

Measuring toxicity is one of the main steps in drug development. Hence, there is a high demand for computational models to predict the toxicity effects of the potential drugs. In this study, we used a dataset, which consists of four toxicity effects:mutagenic, tumorigenic, irritant and reproductive effects. The proposed model consists of three phases. In the first phase, rough set-based methods...

Journal: :ECTI Transactions on Computer and Information Technology (ECTI-CIT) 2019

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید