نتایج جستجو برای: mapreduce

تعداد نتایج: 3018  

2010
Guozhang Wang

MapReduce [10] gives us an appropriate model for distributed parallel computing. There are several features which are proved useful: 1) centralized job distribution. 2) Fault tolerance mechanism for both masters and workers. Although there is controversies about MapReduce capability to replace standard RDBMS [12, 13], it is reasonable that existing proposals to use MapReduce in relational data ...

Journal: :Future Generation Comp. Syst. 2011
Saurabh Sehgal Miklós Erdélyi André Merzky Shantenu Jha

Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data, multiple sources of data as well as resource types. The primary aim of this paper is to understand different ways and levels in which application-level interoperability ca...

2014
Inna Pereverzeva Michael J. Butler Asieh Salehi Fathabadi Linas Laibinis Elena Troubitsyna

MapReduce is a powerful distributed data processing model that is currently adopted in a wide range of domains to efficiently handle large volumes of data, i.e., cope with the big data surge. In this paper, we propose an approach to formal derivation of the MapReduce framework. Our approach relies on stepwise refinement in Event-B and, in particular, the event refinement structure approach – a ...

Journal: :PVLDB 2012
Yanpei Chen Sara Alspaugh Randy H. Katz

Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and increasingly interactive jobs in addition to the large, long-running batch jobs for which MapReduce was originally designed. As interactive, large-scale query pro...

Journal: :CoRR 2014
Liya Fan Bo Gao Xi Sun Fa Zhang Zhiyong Liu

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns finegrained scheduling on MapReduce operations. Each operation represents one invocation of the Map or Reduce function. Scheduling MapReduce operations is difficult due to highly skewed operation loads, no support to collect w...

2011
Zhiang Wu Jie Cao Changjian Fang

Distributed data mining (DDM) which often utilizes autonomous agents is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data. As datasets double in size every year, moving the data repeatedly to distant CPUs brings about high communication cost. In this paper, data cloud is utilized to implement DDM in order to move the data rat...

2014
Kasper Mullesgaard Jens Laurits Pederseny Hua Lu Yongluan Zhou

Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...

2017
Sophie Cerf Mihaly Berekmeri Bogdan Robu Nicolas Marchand Sara Bouchenak

MapReduce is a popular programming model for distributed data processing and Big Data applications running on clouds. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates an...

2009
Karthik Kambatla Abhinav Pathak Himabindu Pucha

Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...

2012
Miao Xin Hao Li Joan Lu

MapReduce is an efficient distributed computing model on large data sets. The data processing is fully distributed on huge amount of nodes, and a MapReduce cluster is of highly scalable. However, single-node performance is gradually to be a bottleneck in computeintensive jobs, which makes it difficult to extend the MapReduce model to wider application fields such as largescale image processing ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید