Tarcil: High Quality and Low Latency Scheduling in Large, Shared Clusters
نویسندگان
چکیده
Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster management focuses either on scheduling speed, using sampling techniques to quickly assign tasks to resources, or on scheduling quality, using centralized algorithms that examine the cluster state to find the most suitable resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality, making it appropriate for large, highly-loaded clusters running both short and long jobs. Tarcil uses an analytically derived sampling framework that dynamically adjusts the sample size based on load and provides guarantees on the quality of scheduling decisions with respect to resource heterogeneity and workload interference. It also implements admission control when sampling is unlikely to find suitable resources for a task. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general workload scenarios, Tarcil increases the fraction of tasks that achieve near ideal performance by 4x and 2x compared to sampling-based and centralized scheduling respectively.
منابع مشابه
Tarcil: Reconciling Scheduling Speed and Quality in Large Datacenters
Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed sched...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملImplementing high-level parallelism on computational GRIDs
Special purpose high performance computers are expensive and rare, but workstation clusters are cheap and becoming common. Emerging technology offers the opportunity to integrate clusters into a single high performance computer a computational Grid. The acceptance of computational Grids, however, is seriously hampered by the difficulty of efficiently managing the parallelism in such heterogeneo...
متن کاملPreemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the oth...
متن کاملTowards zero latency photonic switching in shared memory networks
Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015