Evolving GP Classifiers for Streaming Data Tasks with Concept Change and Label Budgets: A Benchmarking Study

نویسندگان

  • Ali Vahdat
  • Jillian Morgan
  • Andrew R. McIntyre
  • Malcolm I. Heywood
  • A. Nur Zincir-Heywood
چکیده

Streaming data classification requires that several additional challenges are addressed that are not typically encountered in offline supervised learning formulations. Specifically, access to data at any training generation is limited to a small subset of the data, and the data itself is potentially generated by a non-stationary process. Moreover, there is a cost to requesting labels, thus a label budget is enforced. Finally, an anytime classification requirement implies that it must be possible to identify a ‘champion’ classifier for predicting labels as the stream progresses. In this work, we propose a general framework for deploying genetic programming (GP) to streaming data classification under these constraints. The framework consists of a sampling policy and an archiving policy that enforce criteria for selecting data to appear in a data subset. Only the exemplars of the data subset are labeled, and it is the content of the data subset that training epochs are performed against. Specific recommendations include support for GP task decomposition / modularity and making additional training epochs per data subset. Both recommendations make significant improvements to the baseline performance of GP under streaming data with label budgets. Benchmarking issues addressed include the identification of datasets and performance measures.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Impact of Class Imbalance in GP Streaming Classification with Label Budgets

Streaming data scenarios introduce a set of requirements that do not exist under supervised learning paradigms typically employed for classification. Specific examples include, anytime operation, non-stationary processes, and limited label budgets. From the perspective of class imbalance, this implies that it is not even possible to guarantee that all classes are present in the samples of data ...

متن کامل

Tapped Delay Lines for GP Streaming Data Classification with Label Budgets

Streaming data classification requires that a model be available for classifying stream content while simultaneously detecting and reacting to changes to the underlying process generating the data. Given that only a fraction of the stream is ‘visible’ at any point in time (i.e. some form of window interface) then it is difficult to place any guarantee on a classifier encountering a ‘well mixed’...

متن کامل

Efficient Multi-label Classification for Evolving Data Streams

Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory. This paper proposes a new experimental framework for studying multi-label...

متن کامل

Streaming Multi-label Classification

This paper presents a new experimental framework for studying multi-label evolving stream classification, with efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non strea...

متن کامل

MOA Concept Drift Active Learning Strategies for Streaming Data

We present a framework for active learning on evolving data streams, as an extension to the MOA system. In learning to classify streaming data, obtaining the true labels may require major effort and may incur excessive cost. Active learning focuses on learning an accurate model with as few labels as possible. Streaming data poses additional challenges for active learning, since the data distrib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015