Active Learning from Stream Data

نویسنده

  • Prakash Jayant Kulkarni
چکیده

In this paper, we propose a new research problem on active learning from data streams where data volumes grow continuously. The objective is to label a small portion of stream data from which a model is derived to predict future instances as accurately as possible. We propose a classifier-ensemble based active learning framework which selectively labels instances from data streams to build an ensemble classifier. Classifier ensemble’s variance directly corresponds to its error rates and the efforts of reducing a classifier ensemble’s variance is equivalent to improving its prediction accuracy. We introduce a Minimum-Variance principle to guide instance labeling process for data streams. The MV principle and the optimal weighting module are combined to build an active learning framework for data streams. Index Terms Active learning, classifier ensemble, stream data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Augmented Query Strategies for Active Learning in Stream Data Mining

Active learning is used in situations where the amount of unlabeled data is abundant but it is costly to manually label the data. So, depending on our available budget, from all unlabeled instances we are to select only a subset of them to ask the oracle for manual labeling. Thus, the query strategy, i.e., how relevant instances are selected to be sent to the oracle, plays an important role in ...

متن کامل

Mining Multi-Label Data Streams Using Ensemble-Based Active Learning

Data stream classification has drawn increasing attention from the data mining community in recent years, where a large number of stream classification models were proposed. However, most existing models were merely focused on mining from single-label data streams. Mining from multi-label data streams has not been fully addressed yet. On the other hand, although some recent work touched the mul...

متن کامل

Web Services for Stream Mining: A Stream-Based Active Learning Use Case

The nature of data on the Web is becoming more and more streamoriented and in this context, the idea of mining Web-generated data streams is becoming a hot topic. Web services, on the other hand, have become an inevitable tool for the future development of the Web. While Web services have been very successful in providing distributed computing environments, they have not been exploited for buil...

متن کامل

Online Passive Aggressive Active Learning and Its Applications

We investigate online active learning techniques for classification tasks in data stream mining applications. Unlike traditional learning approaches (either batch or online learning) that often require to request the class label of each incoming instance, online active learning queries only a subset of informative incoming instances to update the classification model, which aims to maximize cla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011