mapreduce

Evaluating map reduce tasks scheduling algorithms over cloud computing infrastructure

Journal: :Concurrency and Computation: Practice and Experience 2015

Qutaibah Althebyan Yaser Jararweh Qussai Yaseen Omar AlQudah Mahmoud Al-Ayyoub

Efficiently scheduling MapReduce tasks is considered as one of the major challenges that face MapReduce frameworks. Many algorithms were introduced to tackle this issue. Most of these algorithms are focusing on the data locality property for tasks scheduling. The data locality may cause less physical resources utilization in non-virtualized clusters and more power consumption. Virtualized clust...

متن کامل

Is Good Enough?

2013

Jimmy Lin

Hadoop is currently the large-scale data analysis ‘‘hammer’’ of choice, but there exist classes of algorithms that aren’t ‘‘nails’’ in the sense that they are not particularly amenable to the MapReduce programming model. To address this, researchers have proposed MapReduce extensions or alternative programming models in which these algorithms can be elegantly expressed. This article espouses a ...

متن کامل

Securing MapReduce Result Integrity via Verification-based Integrity Assurance Framework

2014

Yongzhi Wang Jinpeng Wei Yucong Duan

MapReduce, a large-scale data processing paradigm, is gaining popularity. However, like other distributed computing frameworks, MapReduce suffers from the integrity assurance vulnerability: malicious workers in the MapReduce cluster could tamper with its computation result and thereby render the overall computation result inaccurate. Existing solutions are effective in defeating the malicious b...

متن کامل

Optimizing Theta-Joins in a MapReduce Environment

2013

Changchun Zhang Jing Li Lei Wu

Data analyzing and processing are important tasks in cloud computing. In this field, the MapReduce framework has become a more and more popular tool to analyze large-scale data over large clusters. Compared with the parallel relational database, it has the advantages of excellent scalability and good fault tolerance. However, the performance of join operation using MapReduce is not as good as t...

متن کامل

MapReduce Based Experimental Frame for Parallel and Distributed Simulation Using Hadoop Platform

2014

Byeong Soo Kim Sun Ju Lee Tag Gon Kim Hae Sang Song

Simulation-based experiment of complex systems is a time consuming-job. Parallel and distributed simulation is one of the methods to reduce the simulation time. To simulate and analyze the system with this method, it is required to design a suitable experimental frame. Therefore, this paper proposes a MapReduce based experimental frame for the parallel and distributed simulation. Because Hadoop...

متن کامل

Real-Time MapReduce Scheduling

2010

Linh T.X. Phan Zhuoyao Zhang Boon Thau Loo Insup Lee

In this paper, we explore the feasibility of enabling the scheduling of mixed hard and soft real-time MapReduce applications. We first present an experimental evaluation of the popular Hadoop MapReduce middleware on the Amazon EC2 cloud. Our evaluation reveals tradeoffs between overall system throughput and execution time predictability, as well as highlights a number of factors affecting real-...

متن کامل

Scalable Scientific Computing Algorithms Using MapReduce

2013

Jingen Xiang

Cloud computing systems, like MapReduce and Pregel, provide a scalable and fault tolerant environment for running computations at massive scale. However, these systems are designed primarily for data intensive computational tasks, while a large class of problems in scientific computing and business analytics are computationally intensive (i.e., they require a lot of CPU in addition to I/O). In ...

متن کامل

MapReduce in MPI for Large-scale graph algorithms

Journal: :Parallel Computing 2011

Steven J. Plimpton Karen D. Devine

We describe a parallel library written with message-passing (MPI) calls that allows algorithms to be expressed in the MapReduce paradigm. This means the calling program does not need to include explicit parallel code, but instead provides “map” and “reduce” functions that operate independently on elements of a data set distributed across processors. The library performs needed data movement bet...

متن کامل

Parallel Tree Reduction on MapReduce

2012

Kento Emoto Hiroto Imachi

MapReduce, the de facto standard for large scale data-intensive applications, is a remarkable parallel programming model, allowing for easy parallelization of data intensive computations over many machines in a cloud. As huge tree data such as XML has achieved the status of the de facto standard for representing structured information, the situation calls for efficient MapReduce programs treati...

متن کامل

Boosting the Efficiency in Similarity Search on Signature Collections

2013

Jong Wook Kim

Computing all signature pairs whose bit differences are less than or equal to a given threshold in large signature collections is an important problem in many applications. In this paper, we leverage MapReduce-based parallelization in order to enable scalable similarity search on the signatures. A road-block in using MapReduce framework in this problem, however, is that the cost of merging and ...

متن کامل