PoBery: Possibly-complete Big Data Queries with Probabilistic Data Placement and Scanning

نویسندگان

چکیده

In big data query processing, there is a trade-off between accuracy and efficiency, for example, sampling approaches completeness efficiency. this article, we argue that performance can be significantly improved by slightly losing the possibility of completeness, is, chance complete. To quantify possibility, define new concept, Probability Completeness (hereinafter referred to as PC). For If executed 100 times, PC = 0.95 guarantees are no more than 5 incomplete results among results. Leveraging probabilistic placement scanning, trade off performance. propose PoBery (POssibly-complete Big quERY), method supports neither complete queries nor queries, but possibly-complete queries. The experimental conducted on HiBench prove accelerate while ensuring PC. Specifically, it guaranteed percentage larger given confidence. Through comparison with state-of-the-art key-value stores, show Drill-based performs fast Drill 1.7 ×, 1.1 1.5 × faster average Drill, Impala, Hive, respectively,

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constellation Queries over Big Data

A geometrical pattern is a set of points with all pairwise distances (or, more generally, relative distances) specified. Finding matches to such patterns has applications to spatial data in seismic, astronomical, and transportation contexts. For example, a particularly interesting geometric pattern in astronomy is the Einstein cross, which is an astronomical phenomenon in which a single quasar ...

متن کامل

Big Data from CT Scanning

Over 100-million of x-ray CT scans are performed worldwide each year. In most cases, a scan projection or sonogram data are discarded after images are read. This represents a huge waste of big data, and an opportunity to develop new methods for better image reconstruction and high dose efficiency. Here we present an initial attempt to archive, utilize and share big data from CT scanning. In thi...

متن کامل

Redoop Infrastructure for Recurring Big Data Queries

This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the supp...

متن کامل

Gumbo: Guarded Fragment Queries over Big Data

We present Gumbo, a system for the efficient evaluation of guarded fragment queries on top of Hadoop and Spark. A key asset of Gumbo is the reduced number of jobs in comparison with recent systems such as Pig, Hive or Shark. For unnested guarded fragment queries, Gumbo even provides a constant bound on the number of jobs independent of the size of the query. In the demo, we will address the fol...

متن کامل

Big Data: Theoretical Aspects [Scanning the Issue]

Big data has burst into public awareness over the past few years as people have become more and more aware of the massive amount of data being produced by social and scientific activities, and its potential utilization for good or harm. On the research front, big data has spurred new activity across a range of fields, including statistics, machine learning, and computer systems. Many areas have...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM/IMS transactions on data science

سال: 2021

ISSN: ['2691-1922', '2577-3224']

DOI: https://doi.org/10.1145/3465375