Memory Conscious Scheduling and Processor Allocation on NUMA Architrchitectures
نویسنده
چکیده
Operating system abstractions do not always meet the needs of a language or applications designer. A lack of efficiency and functionality in scheduling mechanisms can be filled by an application-specific runtime environment providing mechanisms for dynamic processor allocation and memory conscious scheduling. We believe that a synergistic approach that involves three components, the operating system, a user-level runtime system and a dynamic processor server can offer the best adaptivity to the needs of multiprogramming. Especially on NUMA architectures data structures and policies of a scheduling architecture have to reflect the various levels of the memory hierarchy in order to achieve high data locality. While CPU utilization still determines scheduling decisions of contemporary schedulers, we propose novel scheduling policies basing on cache miss rates. An interface between user-level runtime system and application is essential to initiate a concurrent memory prefetching.The application is informed about scheduling decisions of the runtime system and can trigger prefetch operations. For the implementation of the runtime system we follow a two level approach: The lower level consists of assembler code for fast thread initialization and context switching. The upper level includes the user-level scheduler which provides load balancing and high cache reusage on top of kernel threads. Because static processor sets, MACH cpu_servers and gang scheduling do not offer the required flexibility and efficiency in processor allocation and scheduling, a new approach to these topics had to be developed. The design of an adaptive dynamic processor server will be sketched. The decisions of this processor server base on processor requests and on information about memory locality of currently running applications. An interface between processor server and user-level scheduler allows the exchange of information to establish a dynamic partitioning of the processors among multiple parallel applications to achieve an optimum between throughput and fair-
منابع مشابه
Multiprogrammed Parallel Application Scheduling in NUMA Multiprocessors
The invention, acceptance, and proliferation of multiprocessors are primarily a result of the quest to increase computer system performance. The most promising features of multiprocessors are their potential to solve problems faster than previously possible and to solve larger problems than previously possible. Large-scale multiprocessors offer the additional advantage of being able to execute ...
متن کاملThe Effect of Multi-core on HPC Applications in Virtualized Systems
In this paper, we evaluate the overheads of virtualization in commercial multicore architectures with shared memory and MPI-based applications. We find that the non-uniformity of memory latencies affects the performance of virtualized systems significantly. Due to the lack of support for non-uniform memory access (NUMA) in the Xen hypervisor, shared memory applications suffer from a significant...
متن کاملMemory Performance and SPEC OpenMP Scalability on Quad-Socket x86_64 Systems
Because of the continuous trend towards higher core counts, parallelization is mandatory for many application domains beyond the traditional HPC sector. Current commodity servers comprise up to 48 processor cores in configurations with only four sockets. Those shared memory systems have distinct NUMA characteristics. The exact location of data within the memory system significantly affects both...
متن کاملMigration with Dynamic Space - Sharing scheduling policies : The case of the SGI O 2000
In this paper, we claim that memory migration mechanism is a useful approach to improve the execution of parallel applications in dynamic execution environments, but that their performance depends on related system components such as the processor scheduling. To show that, we evaluate the automatic memory migration mechanism provided by IRIX in Origin systems, under different dynamic processor ...
متن کاملClassifying and alleviating the communication overheads in matrix computations on large-scale NUMA multiprocessors
Large-scale, shared-memory multiprocessors have non-uniform memory access (NUMA) costs. The high communication cost dominates the source of matrix computations' execution. Memory contention and remote memory access are two major communication overheads on large-scale NUMA multiprocessors. However, previous experiments and discussions focus either on reducing the number of remote memory accesses...
متن کامل