mapreduce

MRBS: A Comprehensive MapReduce Benchmark Suite

2012

Amit Sangroya Damián Serrano Sara Bouchenak

MapReduce is a promising programming model for distributed data processing. Extensive research has been conducted on the scalability of MapReduce, and several systems have been proposed in the literature, ranging from job scheduling to data placement and replication. However, realistic benchmarks are still missing to analyze and compare the effectiveness of these proposals. To date, most MapRed...

متن کامل

Towards a MapReduce Application Performance Model

2014

Jared Gray Thomas C. Bressoud

In the modern age, our ability to generate large data sets far outpaces our capacity for analyzing them. Google’s proposed solution to this fundamental problem – the MapReduce paradigm and runtime system – has recently gained traction in the scientific and “big data” industries. However, the performance characteristics of MapReduce are not well known. This paper builds on the e↵orts of prior re...

متن کامل

Parallel Heuristics for TSP on MapReduce

2010

Siddhartha Jain Matthew Mallozzi

We analyze the possibility of parallelizing the Traveling Salesman Problem over the MapReduce architecture. We present the serial and parallel versions of two algorithms Tabu Search and Large Neighborhood Search. We compare the best tour length achieved by the Serial version versus the best achieved by the MapReduce version. We show that Tabu Search and Large Neighborhood Search are not well su...

متن کامل

Recomputation-based data reliability for MapReduce using lineage

2016

Sherif Akoush Ripduman Sohan Andy Hopper

Ensuring block-level reliability of MapReduce datasets is expensive due to the spatial overheads of replicating or erasure coding data. As the amount of data processed with MapReduce continues to increase, this cost will increase proportionally. In this paper we introduce Recomputation-Based Reliability in MapReduce (RMR), a system for mitigating the cost of maintaining reliable MapReduce datas...

متن کامل

Vers une plate-forme MapReduce tolérant les fautes byzantines

Journal: :Technique et Science Informatiques 2012

Luciana Arantes Alysson Neves Bessani Vinicius V. Cogo Miguel Correia Pedro Costa Jonathan Lejeune M. Piffaretti Olivier Marin Marcelo Pasin Pierre Sens F. Silva Julien Sopena

Byzantine faults are inherent in massive parallel computation, including those based on the MapReduce model. Yet, the current MapReduce framework implementations do not tolerate Byzantine failures. Therefore, it is not possible to verify if the final results of a MapReduce application are correct. We present in this article a MapReduce architecture where tasks are replicated aiming at ensuring ...

متن کامل

A New Memory MapReduce Framework for Higher Access to Resources

2017

ZuKuan Wei Bo Hong JaeHong Kim

The demand for highly parallel data processing platform was growing due to an explosion in the number of massive-scale data applications both in academia and industry. MapReduce was one of the most meaningful solutions to deal with big data distributed computing. This paper was based on the work of Hadoop MapReduce. In the face of massive data computing and calculation process, MapReduce genera...

متن کامل

School of Information Technologies Preliminary Results on Modeling Cpu Utilization of Mapreduce Programs

2010

NIKZAD BABAII RIZVANDI YOUNG CHOON LEE ALBERT Y. ZOMAYA Nikzad Babaii Rizvandi Young Choon Lee Albert Y. Zomaya

In this paper, we propose an approach for predicting the CPU utilization of applications when they are running on MapReduce. Our approach has two key components: a set of an application experiments running on MapReduce to profile the CPU utilization of the application on a given platform, and a regression-based model that maps the MapReduce configuration parameters (number of Mappers, number of...

متن کامل

Efficient Big Data Processing in Hadoop MapReduce

Journal: :PVLDB 2012

Jens Dittrich Jorge-Arnulfo Quiané-Ruiz

This tutorial is motivated by the clear need of many organizations, companies, and researchers to deal with big data volumes efficiently. Examples include web analytics applications, scientific applications, and social networks. A popular data processing engine for big data is Hadoop MapReduce. Early versions of Hadoop MapReduce suffered from severe performance problems. Today, this is becoming...

متن کامل

SQL-to-MapReduce Translation for Efficient OLAP Query Processing with MapReduce

Journal: :International Journal of Database Theory and Application 2017

متن کامل

The Barnes-Hut Algorithm in MapReduce

2013

Ross Adelman

MapReduce has been used before to analyze N -body-like data. For example, in [4], a friends of friends algorithm was distributed across a MapReduce-like framework. Also, in [5], Pig was used to analyze large amounts of astronomical data. In both of these, the datasets were very large, in the hundreds of GBs and low TBs. These examples give hope that MapReduce can be used effectivly on a N -body...

متن کامل