MREv: An Automatic MapReduce Evaluation Tool for Big Data Workloads

نویسندگان

  • Jorge Veiga
  • Roberto R. Expósito
  • Guillermo L. Taboada
  • Juan Touriño
چکیده

The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case depends on its design and implementation, the underlying system resources and the type of application to be run. Therefore, the appropriate selection of one of these frameworks generally involves the execution of multiple experiments in order to assess their performance, scalability and resource efficiency. This work studies the main issues of this evaluation, proposing a new MapReduce Evaluator (MREv) tool which unifies the configuration of the frameworks, eases the task of collecting results and generates resource utilization statistics. Moreover, a practical use case is described, including examples of the experimental results provided by this tool. MREv is available to download at http://mrev.des.udc.es.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads

Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. As interactive, large-scale query pro...

متن کامل

A Performance Study of Big Data on Small Nodes

The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x8664 (big) server nodes with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could...

متن کامل

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the oth...

متن کامل

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

Redoop Infrastructure for Recurring Big Data Queries

This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the supp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015