Tarcil: High Quality and Low Latency Scheduling in Large, Shared Clusters

نویسندگان

Christina DELIMITROU

Daniel SANCHEZ

Christos KOZYRAKIS

چکیده

Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster management focuses either on scheduling speed, using sampling techniques to quickly assign tasks to resources, or on scheduling quality, using centralized algorithms that examine the cluster state to find the most suitable resources that improve both task performance and cluster utilization. We present Tarcil, a distributed scheduler that targets both scheduling speed and quality, making it appropriate for large, highly-loaded clusters running both short and long jobs. Tarcil uses an analytically derived sampling framework that dynamically adjusts the sample size based on load and provides guarantees on the quality of scheduling decisions with respect to resource heterogeneity and workload interference. It also implements admission control when sampling is unlikely to find suitable resources for a task. We evaluate Tarcil on clusters with hundreds of servers on EC2. For highly-loaded clusters running short jobs, Tarcil improves task execution time by 41% over a distributed, sampling-based scheduler. For more general workload scenarios, Tarcil increases the fraction of tasks that achieve near ideal performance by 4x and 2x compared to sampling-based and centralized scheduling respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tarcil: Reconciling Scheduling Speed and Quality in Large Datacenters

Scheduling diverse applications in large, shared clusters is particularly challenging. Recent research on cluster scheduling focuses either on scheduling speed, using sampling to quickly assign resources to tasks, or on scheduling quality, using centralized algorithms that search for the resources that improve both task performance and cluster utilization. We present Tarcil, a distributed sched...

متن کامل

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...

متن کامل

Implementing high-level parallelism on computational GRIDs

Special purpose high performance computers are expensive and rare, but workstation clusters are cheap and becoming common. Emerging technology offers the opportunity to integrate clusters into a single high performance computer a computational Grid. The acceptance of computational Grids, however, is seriously hampered by the difficulty of efficiently managing the parallelism in such heterogeneo...

متن کامل

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Data centers are evolving to host heterogeneous workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. On the one hand, latency-critical jobs need to be scheduled as soon as they are submitted to avoid any queuing delays. On the oth...

متن کامل

Towards zero latency photonic switching in shared memory networks

Photonic networks-on-chip based on silicon photonics have been proposed to reduce latency and power consumption in future chip multi-core processors (CMP). However, high performance CMPs use a shared memory model which generates large numbers of short messages, creating high arbitration latency overhead for photonic switching networks. In this paper we explore techniques which intelligently use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Tarcil: High Quality and Low Latency Scheduling in Large, Shared Clusters

نویسندگان

چکیده

منابع مشابه

Tarcil: Reconciling Scheduling Speed and Quality in Large Datacenters

A Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints

Implementing high-level parallelism on computational GRIDs

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

Towards zero latency photonic switching in shared memory networks

عنوان ژورنال:

اشتراک گذاری