The Bene ts of Hardware - Assisted Fine - Grain Multithreading Kevin B . Theobald

نویسندگان

  • Kevin B. Theobald
  • Guang R. Gao
چکیده

Today there is widespread interest in using o -the-shelf computers to build economical supercomputers. Clusters such as Beowulf can use packages such as MPI to run coarsegrain parallel code. While this usually works well for applications with regular control structures and data distributions, it is not so e ective for many irregular applications. For such problems, ne-grain parallel programs express the algorithm more naturally, adapt better to changing conditions, and balance the load more e ectively. However, ne-grain parallel computing has overheads which have hurt performance, especially on o -the-shelf systems. We show that ne-grain parallel programming can be supported e ciently on such systems, if there is a suitable program execution model and a small amount of specialized hardware. We present a general model for a thread hierarchy based on bers and threaded procedures. The former are executed non-preemptively, which allows them to run e ciently on o -the-shelf processors. We show how the remaining features of the model (e.g., interaction between the bers) can be supported e ciently in a small amount of external hardware assisting a commodity processor, and that this hardware can be added in an evolutionary manner. Experiments show our hardware support signi cantly reduces multithreading overheads and improves load balancing, leading to substantial improvements in processor utilization and speedups, especially for the most ne-grained benchmarks tested.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison

This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synch...

متن کامل

Cache-Affinity Scheduling for Fine Grain Multithreading

Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impa...

متن کامل

Performance Analysis of Enhanced Fine–grain Multithreaded Distributed–memory Systems

In fine–grain multithreading, the thread changes in each processor cycle, consecutive instructions are thus issued from different threads, and no data dependencies stall the pipeline. Enhanced fine–grain multithreading maintains a number of additional threads which are used to replace an active thread when it initiates a long–latency operation. Performance improvements due to enhanced multithre...

متن کامل

Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithread...

متن کامل

Simultaneous Multithreading: Maximizing On-Chip Parallelism - Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on

This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with altemative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012