The Bene ts of Hardware - Assisted Fine - Grain Multithreading Kevin B . Theobald
نویسندگان
چکیده
Today there is widespread interest in using o -the-shelf computers to build economical supercomputers. Clusters such as Beowulf can use packages such as MPI to run coarsegrain parallel code. While this usually works well for applications with regular control structures and data distributions, it is not so e ective for many irregular applications. For such problems, ne-grain parallel programs express the algorithm more naturally, adapt better to changing conditions, and balance the load more e ectively. However, ne-grain parallel computing has overheads which have hurt performance, especially on o -the-shelf systems. We show that ne-grain parallel programming can be supported e ciently on such systems, if there is a suitable program execution model and a small amount of specialized hardware. We present a general model for a thread hierarchy based on bers and threaded procedures. The former are executed non-preemptively, which allows them to run e ciently on o -the-shelf processors. We show how the remaining features of the model (e.g., interaction between the bers) can be supported e ciently in a small amount of external hardware assisting a commodity processor, and that this hardware can be added in an evolutionary manner. Experiments show our hardware support signi cantly reduces multithreading overheads and improves load balancing, leading to substantial improvements in processor utilization and speedups, especially for the most ne-grained benchmarks tested.
منابع مشابه
A Multithreaded Parallel Implementation of a Dynamic Programming Algorithm for Sequence Comparison
This paper discusses the issues involved in implementing a dynamic programming algorithm for biological sequence comparison on a general-purpose parallel computing platform based on a fine-grain event-driven multithreaded program execution model. Fine-grain multithreading permits efficient parallelism exploitation in this application both by taking advantage of asynchronous point-to-point synch...
متن کاملCache-Affinity Scheduling for Fine Grain Multithreading
Cache utilisation is often very poor in multithreaded applications, due to the loss of data access locality incurred by frequent context switching. This problem is compounded on shared memory multiprocessors when dynamic load balancing is introduced and thread migration disrupts cache content. In this paper, we present a technique, which we refer to as ‘batching’, for reducing the negative impa...
متن کاملPerformance Analysis of Enhanced Fine–grain Multithreaded Distributed–memory Systems
In fine–grain multithreading, the thread changes in each processor cycle, consecutive instructions are thus issued from different threads, and no data dependencies stall the pipeline. Enhanced fine–grain multithreading maintains a number of additional threads which are used to replace an active thread when it initiates a long–latency operation. Performance improvements due to enhanced multithre...
متن کاملMemory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors
In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithread...
متن کاملSimultaneous Multithreading: Maximizing On-Chip Parallelism - Computer Architecture, 1995. Proceedings., 22nd Annual International Symposium on
This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with altemative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing ...
متن کامل