نتایج جستجو برای: mapreduce

تعداد نتایج: 3018  

2010
Jie Pan Frédéric Magoulès Yann Le Biannic

MapReduce model is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large dataset. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapRed...

2013
Vasiliki Kalavri Vladimir Vlassov Per Brand

Pig, a high-level dataflow system built on top of Hadoop MapReduce, has greatly facilitated the implementation of data-intensive applications. Pig successfully manages to conceal Hadoop’s one input and two-stage inflexible pipeline limitations, by translating scripts into MapReduce jobs. However, these limitations are still present in the backend, often resulting in inefficient execution. Strat...

Journal: :IEICE Transactions 2017
Junsu Kim Kyong-Ha Lee Myoung-Ho Kim

With rapid increase of the number of applications as well as the sizes of data, multi-query processing on the MapReduce framework has gained much attention. Meanwhile, there have been much interest in skyline query processing due to its power of multi-criteria decision making and analysis. Recently, there have been attempts to optimize multi-query processing in MapReduce. However, they are not ...

Journal: :CoRR 2012
Nikzad Babaii Rizvandi Young Choon Lee Albert Y. Zomaya

In this paper, we present an approach to predict the total CPU utilization in terms of CPU clock tick of applications when running on MapReduce framework. Our approach has two key phases: profiling and modeling. In the profiling phase, an application is run several times with different sets of MapReduce configuration parameters to profile total CPU clock tick of the application on a given platf...

2013
Mohammad Hammoud Majd F. Sakr

MapReduce is now a pervasive analytics engine on the cloud. Hadoop is an open source implementation of MapReduce and is currently enjoying wide popularity. Hadoop offers a high-dimensional space of configuration parameters, which makes it difficult for practitioners to set for efficient and cost-effective execution. In this work we observe that MapReduce application performance is highly influe...

2014
Sylvain Gault Christian Pérez

Whether it is for e-science or business, the amount of data produced every year is growing at a high rate. Managing and processing those data raises new challenges. MapReduce is one answer to the need for scalable tools able to handle the amount of data. It imposes a general structure of computation and let the implementation perform its optimizations. During the computation, there is a phase c...

2012
Gero Greiner Riko Jacob

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only little work has been done yet to put MapReduce on a par with the major computational models. Following pioneer work that relates the MapReduce framework with ...

Journal: :Concurrency and Computation: Practice and Experience 2014
Zhuoyao Zhang Ludmila Cherkasova Boon Thau Loo

In MapReduce environments, many applications have to achieve different performance goals for producing time relevant results. One of typical user questions is how to estimate the completion time of a MapReduce program as a function of varying input dataset sizes and given cluster resources. In this work, we offer a novel performance evaluation framework for answering this question. We analyze t...

Journal: :Omics : a journal of integrative biology 2011
Judy Qiu

Cloud computing [1] offers new approaches for scientific computing that leverage the major commercial hardware and software investment in this area. Closely coupled applications are still unclear in clouds as synchronization costs are still higher than on optimized MPI machines. However loosely coupled problems are very important in many fields and can achieve good cloud performance even when p...

2008

MapReduce is a programming model and an associated implementation used by Google for processing their massive data sets. It has a simple yet powerful interface that is amenable to a broad variety of problems. Since 2003, when the MapReduce framework was first created, more than ten thousand distinct programs have been implemented under this model. A large number of MapReduce tasks are now runni...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید