Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory
ثبت نشده
چکیده
The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resulting in dynamic load imbalance. Load imbalance can result in loss of performance and decrease in system utilization. We address the challenge of handling both transient and persistent load imbalance while maintaining locality and incurring low overhead. In this paper, we propose a new integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to handle the load imbalance problem. It utilizes an infrequent periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ so as to enable creation of potential tasks via OpenMP’s parallel loop construct. This is not specific to Charm++ and is also available to MPI applications as well through Adaptive MPI implementation. We show the benefit of using this integrated runtime system on three different applications. We show improvements of 2X on ChaNGa on 128K cores and more than 3X on NAMD at 2K cores. We also show the benefit on an MPI application, Kripke, using Adaptive MPI.
منابع مشابه
The Region Trap Library: Handling Traps on Application-Defined Regions of Memory
User-level virtual memory (VM) primitives are used in many different application domains including distributed shared memory, persistent objects, garbage collection, and checkpointing. Unfortunately, VM primitives only allow traps to be handled at the granularity of fixedsized pages defined by the operating system and architecture. In many cases, this results in a size mismatch between pages an...
متن کاملLinking and Loading in a Persistent Dsm Operating System
Our native Java compiler directly generates runtime structures in a persistent Distributed Shared Memory (DSM). The compiler has been used to build a general purpose PC Operating System (OS) on top of a persistent DSM memory. The persistent DSM operating environment lends itself naturally to an integration of symbol tables, class descriptors and naming during Java program compilation and execut...
متن کاملData Placement in a Shared - Nothing Parallel Deductive Database
Until recently most research into parallel databases has focussed on relational database systems. Nevertheless, there is growing interest in more powerful alternative systems such as deductive databases. Several rule handling strategies have been developed to incorporate deductive capabilities into parallel database systems. However, in a shared-nothing environment, the performance of a rule ha...
متن کاملLarchant-RDOSS: a Distributed Shared Persistent Memory and its Garbage Collector
Larchant-RDOSS is a distributed shared memory that persists on reliable storage across process lifetimes. Memory management is automatic: including consistent caching of data and of locks, collecting objects unreachable from the persistent root, writing reachable objects to disk, and reducing store fragmentation. Memory management is based on a novel garbage collection algorithm, that approxima...
متن کامل