ROUTE: run-time robust reducer workload estimation for MapReduce
نویسندگان
چکیده
MapReduce has become a popular model for large-scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline-aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user-specified map function of a given job are unknown to the MapReduce framework before the job starts, accurately predicting workload of reducers can be a difficult challenge. To address this challenge, we present ROUTE, a run-time robust reducer workload estimation technique for MapReduce. ROUTE progressively samples the partition size of the early completed mappers, allowing ROUTE to perform estimation at run time yet fulfilling the accuracy requirement specified by users. Moreover, by using robust estimation and bootstrapping resampling techniques, ROUTE can achieve high applicability to a wide variety of applications. Through experiments using both real and synthetic data on an 11-node Hadoop cluster, we show ROUTE can achieve high accuracy with error rate no more than 10.92% and an improvement of 40.6% in terms of error rate while compared with the state-of-the-art solution. Besides, through simulations using synthetic data, we show that ROUTE is robust to a variety of skewed distributions. Finally, we apply ROUTE to existing load balancing and deadline-aware scheduling frameworks and show ROUTE significantly improves the performance of these frameworks. Copyright © 2016 John Wiley & Sons, Ltd
منابع مشابه
Handling Data Skew in MapReduce
MapReduce systems have become popular for processing large data sets and are increasingly being used in e-science applications. In contrast to simple application scenarios like word count, e-science applications involve complex computations which pose new challenges to MapReduce systems. In particular, (a) the runtime complexity of the reducer task is typically high, and (b) scientific data is ...
متن کاملTraffic Analysis in MapReduce
-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...
متن کاملEfficient Skyline Computation in MapReduce
Skyline queries are useful for finding interesting tuples from a large data set according to multiple criteria. The sizes of data sets are constantly increasing and the architecture of back-ends are switching from single-node environments to non-conventional paradigms like MapReduce. Despite the usefulness of skyline queries, existing works on skyline computation in MapReduce do not take full a...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملSimulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry
In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the MapReduce framework, which result in efficient MapReduce algorithms for sorting, 1-dimensional all nearest-neighbors, 2-dimensional convex hulls, 3-dimensional ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. Journal of Network Management
دوره 26 شماره
صفحات -
تاریخ انتشار 2016