Natjam: Eviction Policies For Supporting Priorities and Deadlines in Mapreduce Clusters
نویسندگان
چکیده
This paper presents Natjam, a system that supports arbitrary job priorities, hard real-time scheduling, and efficient preemption for Mapreduce clusters that are resource-constrained. Our contributions include: i) smart eviction policies for jobs and for tasks, based on resource usage, task runtime, and job deadlines; and ii) a work-conserving task preemption mechanism. We incorporated Natjam into the Hadoop YARN scheduler framework (in Hadoop 0.23). We present experiments from deployments on a test cluster, Emulab and a Yahoo! commercial cluster, using both synthetic traces as well as Hadoop cluster traces we obtained from Yahoo!. Our results reveal that Natjam incurs overheads of under 7%. Under real Hadoop workloads, Natjam performs better than existing techniques.
منابع مشابه
Research Statement -muntasir Raihan Rahman
My research goal is to build adaptive big data and cloud systems that can meet a spectrum of user requirements expressed using service level agreements (SLA) and service level objectives (SLO). During my PhD, I have worked on several angles of tracking and enforcing SLA/SLO guarantees in cloud systems, including in Mapreduce clusters, and NoSQL key-value storage systems. My future research goal...
متن کاملApplication profiling and resource management for MapReduce
Application profiling and resourcemanagement forMapReduce Scale of data generated and processed is exponential growth in the Big Data ear. It poses a challenge that is far beyond the goal of a single computing system. Processing such vast amount of data on a single machine is impracticable in term of time or cost. Hence, distributed systems, which can harness very large clusters of commodity co...
متن کاملPASS: Power-Aware Scheduling of Mixed Applications with Deadline Constraints on Clusters
Reducing energy consumption has become a pressing issue in cluster computing systems not only for minimizing electricity cost, but also for improving system reliability. Therefore, it is highly desirable to design energy-efficient scheduling algorithms for applications running on clusters. In this paper, we address the problem of non-preemptively scheduling mixed tasks on power-aware clusters. ...
متن کاملNetwork-Aware Task Assignment for MapReduce Applications in Shared Clusters
Running MapReduce applications in shared clusters is becoming increasingly compelling to improve the cluster utilization. However, the network sharing across diverse applications can make the network bandwidth for MapReduce applications constrained and heterogeneous, which inevitably increases the severity of network hotspots in racks, and makes the existing task assignment policies that focus ...
متن کاملScather: programming with multi-party computation and MapReduce
We present a prototype of a distributed computational infrastructure, an associated highlevel programming language, and an underlying formal framework that allow multiple parties to leverage their own cloud-based computational resources (capable of supporting MapReduce [27] operations) in concert with multi-party computation (MPC) to execute statistical analysis algorithms that have privacy-pre...
متن کامل