Parallelizing Skyline Queries for Scalable Distribution

نویسندگان

  • Ping Wu
  • Caijie Zhang
  • Ying Feng
  • Ben Y. Zhao
  • Divyakant Agrawal
  • Amr El Abbadi
چکیده

Skyline queries help users make intelligent decisions over complex data, where different and often conflicting criteria are considered. Current skyline computation methods are restricted to centralized query processors, limiting scalability and imposing a single point of failure. In this paper, we address the problem of parallelizing skyline query execution over a large number of machines by leveraging content-based data partitioning. We present a novel distributed skyline query processing algorithm (DSL) that discovers skyline points progressively. We propose two mechanisms, recursive region partitioning and dynamic region encoding, to enforce a partial order on query propagation in order to pipeline query execution. Our analysis shows that DSL is optimal in terms of the total number of local query invocations across all machines. In addition, simulations and measurements of a deployed system show that our system load balances communication and processing costs across cluster machines, providing incremental scalability and significant performance improvement over alternative distribution mechanisms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Skyline Diagram: Finding the Voronoi Counterpart for Skyline Queries

Skyline queries are important in many application domains. In this paper, we propose a novel structure Skyline Diagram, which given a set of points, partitions the plane into a set of regions, referred to as skyline polyominos. All query points in the same skyline polyomino have the same skyline query results. Similar to kth-order Voronoi diagram commonly used to facilitate k nearest neighbor (...

متن کامل

K-Dominant Skyline Computation by Using Sort-Filtering Method

Skyline queries are useful in many applications such as multicriteria decision making, data mining, and user preference queries. A skyline query returns a set of interesting data objects that are not dominated in all dimensions by any other objects. For a high-dimensional database, sometimes it returns too many data objects to analyze intensively. To reduce the number of returned objects and to...

متن کامل

Efficient Parallel Skyline Query Processing for High-Dimensional Data

Given a set of multidimensional data points, skyline queries retrieve those points that are not dominated by any other points in the set. Due to the ubiquitous use of skyline queries, such as in preference-based query answering and decision making, and the large amount of data that these queries have to deal with, enabling their scalable processing is of critical importance. However, there are ...

متن کامل

Skyline Ordering: A Flexible Framework for Efficient Resolution of Size Constraints on Skyline Queries

Given a set of multi-dimensional points, a skyline query returns the interesting points that are not dominated by other points. It has been observed that the actual cardinality (s) of a skyline query result may differ substantially from the desired result cardinality (k), which has prompted studies on how to reduce s for the case where k < s. This paper goes further by addressing the general ca...

متن کامل

k-dominant and Extended k-dominant Skyline

Skyline queries have recently attracted a lot of attention for its intuitive query formulation. It can act as a filter to discard sub-optimal objects. However, a major drawback of skyline is that, in datasets with many dimensions, the number of skyline objects becomes large and no longer offer any interesting insights. To solve the problem, k-dominant skyline queries have been introduced, which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006