MREv: An Automatic MapReduce Evaluation Tool for Big Data Workloads
نویسندگان
چکیده
The popularity of Big Data computing models like MapReduce has caused the emergence of many frameworks oriented to High Performance Computing (HPC) systems. The suitability of each one to a particular use case depends on its design and implementation, the underlying system resources and the type of application to be run. Therefore, the appropriate selection of one of these frameworks generally involves the execution of multiple experiments in order to assess their performance, scalability and resource efficiency. This work studies the main issues of this evaluation, proposing a new MapReduce Evaluator (MREv) tool which unifies the configuration of the frameworks, eases the task of collecting results and generates resource utilization statistics. Moreover, a practical use case is described, including examples of the experimental results provided by this tool. MREv is available to download at http://mrev.des.udc.es.
منابع مشابه
Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads
Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. As interactive, large-scale query pro...
متن کاملA Performance Study of Big Data on Small Nodes
The continuous increase in volume, variety and velocity of Big Data exposes datacenter resource scaling to an energy utilization problem. Traditionally, datacenters employ x8664 (big) server nodes with power usage of tens to hundreds of Watts. But lately, low-power (small) systems originally developed for mobile devices have seen significant improvements in performance. These improvements could...
متن کاملPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the oth...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملRedoop Infrastructure for Recurring Big Data Queries
This demonstration presents the Redoop infrastructure, the first fullfledged MapReduce framework with native support for recurring big data queries. Recurring queries, repeatedly being executed for long periods of time over evolving high-volume data, have become a bedrock component in most large-scale data analytic applications. Redoop is a comprehensive extension to Hadoop that pushes the supp...
متن کامل