Selectivity estimation of range queries in data streams using micro-clustering

نویسندگان

  • Sudhanshu Gupta
  • Deepak Garg
چکیده

Selectivity estimation is an important task for query optimization. The common data mining techniques are not applicable on large, fast and continuous data streams as they require one pass processing of data. These requirements make Range Query Estimation (RQE) a challenging task. We propose a technique to perform RQE using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters. These micro-clusters also maintain data distribution information of the cluster values using cosine coefficients. These cosine coefficients are used for estimating range queries. The estimation can be done over a range of data values spread over a number of clusters. The technique has been compared with cosine series technique for selectivity estimation. Experiments have been conducted on both synthetic and real datasets of varying sizes and results confirm that our technique offers substantial improvements in accuracy over other methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selectivity Estimation over Multiple Data Streams using Micro-clustering

Selectivity estimation is an important task for query optimization. We propose a technique to perform range query estimation over multiple data streams using micro-clustering. The technique maintains cluster statistics in terms of micro-clusters and cosine series for all streams. These microclusters maintain data distribution information about the stream values using cosine coefficients. These ...

متن کامل

On Futuristic Query Processing in Data Streams

Recent advances in hardware technology have resulted in the ability to collect and process large amounts of data. In many cases, the collection of the data is a continuous process over time. Such continuous collections of data are referred to as data streams. One of the interesting problems in data stream mining is that of predictive query processing. This is useful for a variety of data mining...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

Processing Queries on Road Networks in Spatial Data Base Perspective for Selectivity Estimation

This work mainly focuses on building a framework that is capable of analyzing spatial approximate substring queries, for mainly to solve the selectivity estimation problem of range queries which belongs to road networks represented in spatial databases. The selectivity estimation is nothing but estimating the size of the results i.e., estimating the number of points that presents in a graph whi...

متن کامل

Query Selectivity Estimation Based on Improved V-optimal Histogram by Introducing Information about Distribution of Boundaries of Range Query Conditions

Selectivity estimation is a parameter used by a query optimizer for early estimation of the size of data that satisfies query condition. Selectivity is calculated using an estimator of distribution of attribute values of attribute involved in a processed query condition. Histograms built on attributes values from a database may be such representation of the distribution. The paper introduces a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Int. Arab J. Inf. Technol.

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2016