Handling Transient and Persistent Imbalance Together in Distributed and Shared Memory

ثبت نشده
چکیده

The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resulting in dynamic load imbalance. Load imbalance can result in loss of performance and decrease in system utilization. We address the challenge of handling both transient and persistent load imbalance while maintaining locality and incurring low overhead. In this paper, we propose a new integrated runtime system that combines the Charm++ distributed programming model with concurrent tasks to handle the load imbalance problem. It utilizes an infrequent periodic assignment of work to cores based on load measurement, in combination with user created tasks to handle load imbalance. We integrate OpenMP with Charm++ so as to enable creation of potential tasks via OpenMP’s parallel loop construct. This is not specific to Charm++ and is also available to MPI applications as well through Adaptive MPI implementation. We show the benefit of using this integrated runtime system on three different applications. We show improvements of 2X on ChaNGa on 128K cores and more than 3X on NAMD at 2K cores. We also show the benefit on an MPI application, Kripke, using Adaptive MPI.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Region Trap Library: Handling Traps on Application-Defined Regions of Memory

User-level virtual memory (VM) primitives are used in many different application domains including distributed shared memory, persistent objects, garbage collection, and checkpointing. Unfortunately, VM primitives only allow traps to be handled at the granularity of fixedsized pages defined by the operating system and architecture. In many cases, this results in a size mismatch between pages an...

متن کامل

Linking and Loading in a Persistent Dsm Operating System

Our native Java compiler directly generates runtime structures in a persistent Distributed Shared Memory (DSM). The compiler has been used to build a general purpose PC Operating System (OS) on top of a persistent DSM memory. The persistent DSM operating environment lends itself naturally to an integration of symbol tables, class descriptors and naming during Java program compilation and execut...

متن کامل

Data Placement in a Shared - Nothing Parallel Deductive Database

Until recently most research into parallel databases has focussed on relational database systems. Nevertheless, there is growing interest in more powerful alternative systems such as deductive databases. Several rule handling strategies have been developed to incorporate deductive capabilities into parallel database systems. However, in a shared-nothing environment, the performance of a rule ha...

متن کامل

Larchant-RDOSS: a Distributed Shared Persistent Memory and its Garbage Collector

Larchant-RDOSS is a distributed shared memory that persists on reliable storage across process lifetimes. Memory management is automatic: including consistent caching of data and of locks, collecting objects unreachable from the persistent root, writing reachable objects to disk, and reducing store fragmentation. Memory management is based on a novel garbage collection algorithm, that approxima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016