Topology-Aware Scheduling on Blue Waters with Proactive Queue Scanning and Migration-Based Job Placement
نویسندگان
چکیده
Modern HPC systems, such as Blue Waters, has a multidimensional torus topology, which makes it hard to achieve both high system utilization and scheduling speed. The low scheduling speed comes from the inefficient resource allocation scheme by using a pre-defined Shape Table, which is highly time consuming. The low system utilization is majorly caused by system fragmentation and node drainage. System fragmentation includes both internal fragmentation due to convex prism shape requirement, and external fragmentation resulted from contiguous allocation strategy. The node drainage is caused by lacking of free contiguous space for allocation upon large job’s schedule. System drainage is due to the insufficient free contiguous space to schedule the incoming large jobs. In this paper, with the objective of resource utilization and scheduling speed improvement, we address the topology-aware scheduling problem on Blue Waters. To improve the scheduling speed, we propose an efficient free partition detection algorithm. To overcome the system drainage by large job, we propose to use proactive queue scanning to search for large job and gain a buffer period to prepare the system for acceptance. With the set of jobs in the scan window and a set of free partition in the system, we can transform the scheduling into off-line placement, which can be modeled as multiple knapsack problem. We design a migration-based job placement heuristic to make space for large job and improve the system utilization. Through extensive simulations of modeled trace data, we demonstrate that our approach works well in terms of improving the system utilization and scheduling speed.
منابع مشابه
CASP: a community-aware scheduling protocol
The existing resource and topology heterogeneity has divided the scheduling solutions into local schedulers and high-level schedulers (a.k.a. meta-schedulers). Although much work has been proposed to optimise job queue based scheduling, seldom has attention been put on the job sharing behaviours between decentralised distributed resource pools, which in turn raises a notable opportunity to expl...
متن کاملCAPS: Contention-Aware Proactive Scheduling for CMPs
Many Chip Multiprocessors (CMPs) rely on shared caches to hide the latency of inter-thread communications as well as to improve effective memory bandwidth. Yet along comes cache contention, which often results in cache thrashing and severe performance degradation. Because of the variety of programs, a suitable schedule can often alleviate the issues significantly. However, it remains an open qu...
متن کاملCross-layer Packet-dependant OFDM Scheduling Based on Proportional Fairness
This paper assumes each user has more than one queue, derives a new packet-dependant proportional fairness power allocation pattern based on the sum of weight capacity and the packet’s priority in users’ queues, and proposes 4 new cross-layer packet-dependant OFDM scheduling schemes based on proportional fairness for heterogeneous classes of traffic. Scenario 1, scenario 2 and scenario 3 lead r...
متن کاملCommunication-Aware Traffic Stream Optimization for Virtual Machine Placement in Cloud Datacenters with VL2 Topology
By pervasiveness of cloud computing, a colossal amount of applications from gigantic organizations increasingly tend to rely on cloud services. These demands caused a great number of applications in form of couple of virtual machines (VMs) requests to be executed on data centers’ servers. Some of applications are as big as not possible to be processed upon a single VM. Also, there exists severa...
متن کاملReducing Energy Costs for IBM Blue Gene/P via Power-Aware Job Scheduling
Energy expense is becoming increasingly dominant in the operating costs of high-performance computing (HPC) systems. At the same time, electricity prices vary significantly at different times of the day. Furthermore, job power profiles also differ greatly, especially on HPC systems. In this paper, we propose a smart, power-aware job scheduling approach for HPC systems based on variable energy p...
متن کامل