Adaptive Resampling with Active Learning

نویسندگان

  • Seyda Ertekin
  • Jian Huang
  • C. Lee Giles
چکیده

This paper proposes a novel algorithm Virtual Instances Resampling Technique Using Active Learning (VIRTUAL) for class imbalance problem in Support Vector Machine (SVM) learning. In supervised learning, prediction performance of the classification algorithms deteriorate when the training set is imbalanced. Class imbalance problem occurs when at least one of the classes are represented by substantially less number of instances than the others in the training set. Various real-world classification tasks, such as medical diagnosis and text categorization suffer from this phenomenon. VIRTUAL is a hybrid method of oversampling and active learning to form an adaptive technique for resampling of the minority class instances. Unlike traditional resampling methods which require preprocessing of the data, VIRTUAL generates virtual instances for the minority class support vectors during the training process, therefore it removes the need for an extra preprocessing stage. Our empirical results show that VIRTUAL outperforms other competitive oversampling techniques and active learning strategy in terms of prediction capability. In addition, VIRTUAL is more efficient in generating new instances and has a shorter training time than the other oversampling techniques due to its adaptive nature and its decision capability in creating virtual instances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Resampling Technique for Relational Data Graphs

Resampling (a.k.a. bootstrapping) is a computationallyintensive statistical technique for estimating the sampling distribution of an estimator. Resampling is used in many machine learning algorithms, including ensemble methods, active learning, and feature selection. Resampling techniques generate pseudosamples from an underlying population by sampling with replacement from a single sample data...

متن کامل

Adaptive Oversampling for Imbalanced Data Classification

Data imbalance is known to significantly hinder the generalization performance of supervised learning algorithms. A common strategy to overcome this challenge is synthetic oversampling, where synthetic minority class examples are generated to balance the distribution between the examples of the majority and minority classes. We present a novel adaptive oversampling algorithm, VIRTUAL, that comb...

متن کامل

An Adaptive Congestion Alleviating Protocol for Healthcare Applications in Wireless Body Sensor Networks: Learning Automata Approach

Wireless Body Sensor Networks (WBSNs) involve a convergence of biosensors, wireless communication and networks technologies. WBSN enables real-time healthcare services to users. Wireless sensors can be used to monitor patients’ physical conditions and transfer real time vital signs to the emergency center or individual doctors. Wireless networks are subject to more packet loss and congestion. T...

متن کامل

Active Learning in Cost - Sensitive Environments

Active learning techniques aim to reduce the amount of labeled data required for a supervised learner to achieve a certain level of performance. This can be very useful in domains where unlabeled data is easy to obtain but labelling data is costly. In this dissertation, I introduce methods of creating computationally efficient active learning techniques that handle different misclassification c...

متن کامل

Optimal Placement and Sizing of DGs and Shunt Capacitor Banks Simultaneously in Distribution Networks using Particle Swarm Optimization Algorithm Based on Adaptive Learning Strategy

Abstract: Optimization of DG and capacitors is a nonlinear objective optimization problem with equal and unequal constraints, and the efficiency of meta-heuristic methods for solving optimization problems has been proven to any degree of complex it. As the population grows and then electricity consumption increases, the need for generation increases, which further reduces voltage, increases los...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009