Overview of streaming-data algorithms

نویسنده

  • T. Soni Madhulatha
چکیده

Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other data sources are referred to as data streams. Various data mining tasks can be performed on data streams in search of interesting patterns. This paper studies a particular data mining task, clustering, which can be used as the first step in many knowledge discovery processes. By grouping data streams into homogeneous clusters, data miners can learn about data characteristics which can then be developed into classification models for new data or predictive models for unknown events. Recent research addresses the problem of data-stream mining to deal with applications that require processing huge amounts of data such as sensor data analysis and financial applications. For such analysis, single-pass algorithms that consume a small amount of memory are critical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid algorithms for Job shop Scheduling Problem with Lot streaming and A Parallel Assembly Stage

In this paper, a Job shop scheduling problem with a parallel assembly stage and Lot Streaming (LS) is considered for the first time in both machining and assembly stages. Lot Streaming technique is a process of splitting jobs into smaller sub-jobs such that successive operations can be overlapped. Hence, to solve job shop scheduling problem with a parallel assembly stage and lot streaming, deci...

متن کامل

Modelling and Scheduling Lot Streaming Flexible Flow Lines

Although lot streaming scheduling is an active research field, lot streaming flexible flow lines problems have received far less attention than classical flow shops. This paper deals with scheduling jobs in lot streaming flexible flow line problems. The paper mathematically formulates the problem by a mixed integer linear programming model. This model solves small instances to optimality. Moreo...

متن کامل

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Streaming Variational Bayes

Overview • Large, streaming data sets are increasingly the norm • Inference for Big Data has generally been non-Bayesian • Advantages of Bayes: complex models, coherent treatment of uncertainty, etc. We deliver: • SDA-Bayes, a framework for Streaming, Distributed, Asynchronous Bayesian inference • Experiments demonstrating streaming topic discovery with comparable predictive performance to non-...

متن کامل

Group Testing in Statistical Signal Recovery

Over the past decade, we have seen a dramatic increase in our ability to collect massive data sets. Concomitantly, our need to process, compress, store, analyize, and summarize these data sets has grown as well. Scientific, engineering, medical, and industrial applications require that we carry out these tasks efficiently and reasonably accurately. Data streams are one type or model of massive ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1203.2000  شماره 

صفحات  -

تاریخ انتشار 2011