Efficient k-NN Search on Streaming Data Series

نویسندگان

  • Xiaoyan Liu
  • Hakan Ferhatosmanoglu
چکیده

Data streams are common in many recent applications, e.g. stock quotes, e-commerce data, system logs, network traffic management, etc. Compared with traditional databases, streaming databases pose new challenges for query processing due to the streaming nature of data which constantly changes over time. Index structures have been effectively employed in traditional databases to improve the query performance. Index building time is not of particular interest in static databases because it can easily be amortized with the performance gains in the query time. However, because of the dynamic nature, index building time in streaming databases should be negligibly small in order to be successfully used in continuous query processing. In this paper, we propose efficient index structures and algorithms for various models of k nearest neighbor (k-NN) queries on multiple data streams. We find scalar quantization as a natural choice for data streams and propose index structures, called VA-Stream and VA-Stream, which are built by dynamically quantizing the incoming dimensions. VA-Stream (and VA-Stream) can be used both as a dynamic summary of the database and as an index structure to facilitate efficient similarity query processing. The proposed techniques are update-efficient and dynamic adaptations of VA-file (vectorapproximation file) and VA-file, and are shown to achieve the same structures as their static versions. They can be generalized to handle aged queries, which are often used in trend-related analysis. A performance evaluation on VA-Stream and VA-Stream shows that the index building time is negligibly small while query time is significantly improved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Efficient Algorithm for Streaming Time-Series Matching that Supports Normalization Transform

According to recent technical advances on sensors and mobile devices, processing of data streams generated by the devices is becoming an important research issue. The data stream of real values obtained at continuous time points is called streaming time-series. Due to the unique features of streaming time-series that are different from those of traditional time-series, similarity matching probl...

متن کامل

Expressing and Optimizing Similarity-Based Queries

Searching for similar objects (in terms of near and nearest neighbors) of a given query object from a large set is an essential task in many applications. Recent years have seen great progress towards efficient algorithms for this task. This paper takes a query language perspective, equipping SQL with the near and nearest search capability by adding a user-defined-predicate, called NN-UDP. The ...

متن کامل

Lot Streaming in No-wait Multi Product Flowshop Considering Sequence Dependent Setup Times and Position Based Learning Factors

This paper considers a no-wait multi product flowshop scheduling problem with sequence dependent setup times. Lot streaming divide the lots of products into portions called sublots in order to reduce the lead times and work-in-process, and increase the machine utilization rates. The objective is to minimize the makespan. To clarify the system, mathematical model of the problem is presented. Sin...

متن کامل

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

Similarity Search in Time Series Databases

In many application domains, data can be represented as a series of values (time series). Examples include stocks, seismic signals, audio, and many more. Similarity search in time series databases is an important research direction. Several methods have been proposed in order to provide algorithms for efficient query processing in the case of static time series of fixed length. Research in this...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003