mapreduce

Distributed Reasoning with EL using MapReduce

2011

Frederick Maier Pascal Hitzler

It has recently been shown that the MapReduce framework for distributed computation can be used effectively for large-scale RDF Schema reasoning, computing the deductive closure of over a billion RDF triples within a reasonable time [23]. Later work has carried this approach over to OWL Horst [22]. In this paper, we provide a MapReduce algorithm for classifying knowledge bases in the descriptio...

متن کامل

MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail!

Journal: :Big data 2013

Jimmy J. Lin

Hadoop is currently the large-scale data analysis "hammer" of choice, but there exist classes of algorithms that aren't "nails" in the sense that they are not particularly amenable to the MapReduce programming model. To address this, researchers have proposed MapReduce extensions or alternative programming models in which these algorithms can be elegantly expressed. This article espouses a very...

متن کامل

Efficient Processing Distributed Joins with Bloomfilter using MapReduce †

2013

Changchun Zhang Lei Wu Jing Li

The MapReduce framework has been widely used to process and analyze largescale datasets over large clusters. As an essential problem, join operation among large clusters attracts more and more attention in recent years due to the utilization of MapReduce. Many strategies have been proposed to improve the efficiency of distributed join, among which bloomfilter is a successful one. However, the b...

متن کامل

Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis

2012

Daniel Q. Duffy John L. Schnase John H. Thompson Shawn M. Freeman Thomas L. Clune

MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In orde...

متن کامل

A Survey on Parallel Method for Rough Set using MapReduce Technique for Data Mining

2015

Varda C. Dhande B. V. Pawar

In this paper Present survey on Data mining, Data mining using Rough set Theory and Data Mining using parallel method for rough set Approximation with MapReduce Technique. With the development of Information technology data growing at a tremendous rate, so big data mining and knowledge discovery become a new challenge. Rough set theory has been successfully applied in data mining by using MapRe...

متن کامل

Job Scheduling for Multi-User MapReduce Clusters

2009

Matei Zaharia Dhruba Borthakur Joydeep Sen Sarma Khaled Elmeleegy Scott Shenker Ion Stoica

Sharing a MapReduce cluster between users is attractive because it enables statistical multiplexing (lowering costs) and allows users to share a common large data set. However, we find that traditional scheduling algorithms can perform very poorly in MapReduce due to two aspects of the MapReduce setting: the need for data locality (running computation where the data is) and the dependence betwe...

متن کامل

Increasing the throughput of machine translation systems using clouds

Journal: :CoRR 2016

Jernej Vicic Andrej Brodnik

The manuscript presents an experiment at implementation of a Machine Translation system in a MapReduce model. The empirical evaluation was done using fully implemented translation systems embedded into the MapReduce programming model. Two machine translation paradigms were studied: shallow transfer Rule Based Machine Translation and Statistical Machine Translation. The results show that the Map...

متن کامل

Poster: Cross Cloud MapReduce: an Uncheatable MapReduce

2012

Yongzhi Wang Jinpeng Wei Mudhakar Srivatsa

MapReduce [1] is becoming a popular data processing application on Cloud Environment. However, security issues make many customers reluctant to move their critical computation tasks to cloud. For instance, [2] points out a real security vulnerability that the cloud service leader Amazon EC2 suffers from: some members of EC2 can create and share Amazon Machine Image (AMI) to the EC2 community so...

متن کامل

Matching-Based Allocation Strategies for Improving Data Locality of Map Tasks in MapReduce

2017

Olivier Beaumont Thomas Lambert Loris Marchal Bastien Thomas

MapReduce is a well-know framework for distributing data-processing computations on parallel clusters. In MapReduce, a large computation is broken into small tasks that run in parallel on multiple machines, and scales easily to very large clusters of inexpensive commodity computers. Before the Map phase, the original dataset is first split into chunks, that are replicated (a constant number of ...

متن کامل

Hadoop Performance Models

Journal: :CoRR 2011

Herodotos Herodotou

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataflow and cost information at the fine granularity of phases within the map and reduce tasks of a job execution. The models can be used to estimate t...

متن کامل