LOOM: Optimal Aggregation Overlays for In-Memory Big Data Processing

نویسندگان

William Culhane

Kirill Kogan

Chamikara Jayalath

Patrick Th. Eugster

چکیده

Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressive big data analysis which store data in main memory across nodes. This shifts the bottlenecks from disk I/O to distributed computation and network communication and significantly increases the impact of aggregation time on total job completion time. This paper presents LOOM, a (sub)system for efficient big data aggregation for use within big data analysis frameworks. LOOM efficiently supports two-phased (sub)computations consisting in a first phase performed on individual data sub-sets (e.g., word count, top-k matching) followed by a second aggregation phase which consolidates individual results of the first phase (e.g., count sum, top-k). Using characteristics of an aggregation function, LOOM constructs a specifically configured aggregation overlay to minimize aggregation costs. We present optimality heuristics and experimentally demonstrate the benefits of thus optimizing aggregation overlays using microbenchmarks and real world examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Revisiting Aggregation for Data Intensive Applications: A Performance Study

Aggregation has been an important operation since the early days of relational databases. Today’s Big Data applications bring further challenges when processing aggregation queries, demanding adaptive aggregation algorithms that can process large volumes of data relative to a potentially limited memory budget (especially in multiuser settings). Despite its importance, the design and evaluation ...

متن کامل

A Hierarchy Topology Design Using a Hybrid Evolutionary Algorithm in Wireless Sensor Networks

Wireless sensor network a powerful network contains many wireless sensors with limited power resource, data processing, and transmission abilities. Wireless sensor capabilities including computational capacity, radio power, and memory capabilities are much limited. Moreover, to design a hierarchy topology, in addition to energy optimization, find an optimum clusters number and best location of ...

متن کامل

Scalable Analytics Model Calibration with Online Aggregation

Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the lack of support to quickly identify sub-optimal conf...

متن کامل

Speculative Approximations for Terascale Analytics

Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurat...

متن کامل

Design and Test of the Real-time Text mining dashboard for Twitter

One of today's major research trends in the field of information systems is the discovery of implicit knowledge hidden in dataset that is currently being produced at high speed, large volumes and with a wide variety of formats. Data with such features is called big data. Extracting, processing, and visualizing the huge amount of data, today has become one of the concerns of data science scholar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

LOOM: Optimal Aggregation Overlays for In-Memory Big Data Processing

نویسندگان

چکیده

منابع مشابه

Revisiting Aggregation for Data Intensive Applications: A Performance Study

A Hierarchy Topology Design Using a Hybrid Evolutionary Algorithm in Wireless Sensor Networks

Scalable Analytics Model Calibration with Online Aggregation

Speculative Approximations for Terascale Analytics

Design and Test of the Real-time Text mining dashboard for Twitter

عنوان ژورنال:

اشتراک گذاری