Multi-Task Learning for Straggler Avoiding Predictive Job Scheduling
نویسندگان
چکیده
Parallel processing frameworks (Dean and Ghemawat, 2004) accelerate jobs by breaking them into tasks that execute in parallel. However, slow running or straggler tasks can run up to 8 times slower than the median task on a production cluster (Ananthanarayanan et al., 2013), leading to delayed job completion and inefficient use of resources. Existing straggler mitigation techniques wait to detect stragglers and then relaunch them, delaying straggler detection and wasting resources. We built Wrangler (Yadwadkar et al., 2014), a system that predicts when stragglers are going to occur and makes scheduling decisions to avoid such situations. To capture node and workload variability, Wrangler built separate models for every node and workload, requiring the time-consuming collection of substantial training data. In this paper, we propose multi-task learning formulations that share information between the various models, allowing us to use less training data and bring training time down from 4 hours to 40 minutes. Unlike naive multi-task learning formulations, our formulations capture the shared structure in our data, improving generalization performance on limited data. Finally, we extend these formulations using group sparsity inducing norms to automatically discover the similarities between tasks and improve interpretability.
منابع مشابه
Faster Jobs in Distributed Data Processing using Multi-Task Learning
Slow running or straggler tasks in distributed processing frameworks [1, 2] can be 6 to 8 times slower than the median task in a job on a production cluster [3], despite existing mitigation techniques. This leads to extended job completion times, inefficient use of resources, and increased costs. Recently, proactive straggler avoidance techniques [4] have explored the use of predictive models t...
متن کاملProactive Straggler Avoidance using Machine Learning
The MapReduce architecture provides self-managed parallelization with fault tolerance for large-scale data processing. Stragglers, the tasks running slower than other tasks of a job, could potentially degrade the overall cluster performance by increasing the job completion time. The original MapReduce paper [1] identified that Stragglers could arise due to various reasons including software mis...
متن کاملAn Effective Task Scheduling Framework for Cloud Computing using NSGA-II
Cloud computing is a model for convenient on-demand user’s access to changeable and configurable computing resources such as networks, servers, storage, applications, and services with minimal management of resources and service provider interaction. Task scheduling is regarded as a fundamental issue in cloud computing which aims at distributing the load on the different resources of a distribu...
متن کاملA multi Agent System Based on Modified Shifting Bottleneck and Search Techniques for Job Shop Scheduling Problems
This paper presents a multi agent system for the job shop scheduling problems. The proposed system consists of initial scheduling agent, search agents, and schedule management agent. In initial scheduling agent, a modified Shifting Bottleneck is proposed. That is, an effective heuristic approach and can generate a good solution in a low computational effort. In search agents, a hybrid search ap...
متن کاملAn algorithm for multi-objective job shop scheduling problem
Scheduling for job shop is very important in both fields of production management and combinatorial op-timization. However, it is quite difficult to achieve an optimal solution to this problem with traditional opti-mization approaches owing to the high computational complexity. The combination of several optimization criteria induces additional complexity and new problems. In this paper, we pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 17 شماره
صفحات -
تاریخ انتشار 2016