Load Balancing Performance of Dynamic Scheduling on NUMA Multiprocessors

نویسندگان

  • M D Durand
  • W Jalby
چکیده

Self scheduling is a method for task scheduling in parallel programs in which each processor acquires a new block of tasks for execution whenever it becomes idle To get the best performance the block size must be chosen to balance the scheduling overhead against the load imbalance To determine the best block size a better understanding of the role of load imbalance in self scheduling performance is needed In this paper we study the e ect of memory contention on task duration distributions and hence load balancing in self scheduling on a Non Uniform Memory Access NUMA machine Experimental studies on a BBN TC are used to reveal the strengths and weaknesses of analytical performance models to predict running time and optimal block size The models are shown to be very accurate for small block sizes However the models fail when the block size is large due to a previously unrecognized source of load imbalance We extend the analytical models to address this failure The implications for the construction of compilers and runtime systems are discussed

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Extended Gradient Model for NUMA Multiprocessor Systems

In this paper, we present the design and implementation of an eeective and scalable dynamic load balancing system for Non-Uniform Memory Access (NUMA) multiprocessors where load balancing is a key issue to achieve adequate eeciency. The proposed load balancing algorithm extends the well-known gradient model to enhance its applicability in a wide range of multiprocessor systems and to improve th...

متن کامل

Hierarchical loop scheduling for clustered NUMA machines

Loop scheduling is an important issue in the development of high performance multiprocessors. As modern multiprocessors have high and non-uniform memory access (NUMA) costs, the communication costs dominate the execution of parallel programs. Previous anity algorithms perform better than dynamic algorithms under non-clustered NUMA multiprocessors, but they su€er heavy overheads when migrating ...

متن کامل

Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors

Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to reduce communication and idle time. In this paper, we study the load balancing problem of data-pa...

متن کامل

Parallel Classification for Data Mining on Shared-Memory Multiprocessors

We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach u...

متن کامل

Multiprogrammed Parallel Application Scheduling in NUMA Multiprocessors

The invention, acceptance, and proliferation of multiprocessors are primarily a result of the quest to increase computer system performance. The most promising features of multiprocessors are their potential to solve problems faster than previously possible and to solve larger problems than previously possible. Large-scale multiprocessors offer the additional advantage of being able to execute ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997