Eager Combining: a Coherency Protocol for Increasing Eeective Network and Memory Bandwidth in Shared-memory Multiprocessors
نویسندگان
چکیده
One common cause of poor performance in large-scale shared-memory multiprocessors is limited memory or interconnection network bandwidth. Even well-designed machines can exhibit band-width limitations when a program issues an excessive number of remote memory accesses or when remote accesses are distributed non-uniformly. While techniques for improving locality of reference are often successful at reducing the number of remote references, a non-uniform distribution of references may still result, which can cause contention both in the interconnection network and at remote memories. Producer/consumer data, where one processor (the producer) writes data that many other processors (the consumers) must read, is a common sharing pattern in parallel programs that generates a non-uniform distribution of references. In this paper we quantify the performance impact of producer/consumer sharing as a function of memory and network bandwidth, and argue that the contention caused by this form of sharing can severely impact performance on large-scale machines. We then propose a new coherency protocol, called eager combining, which is designed to alleviate this contention. The protocol replicates the producer's data among multiple memory modules, thereby eeectively increasing both the memory and network bandwidth of the producer, and dramatically decreasing the remote access latency of consumers. We compare eager combining to other techniques for reducing or eliminating contention, and use execution-driven simulation of parallel programs on a large-scale multiprocessor to show that eager combining can improve performance by a factor of 4 or more when used for programs with producer/consumer data on machines with hundreds of processors.
منابع مشابه
Eager combining: a coherency protocol for increasing effective network and memory bandwidth in shared-memory multiprocessors
An excessive number of remote accesses or a non-uniform distribution of remote accesses can cause even well-designed multiprocessors to exhibit severe memory and network contention. Producer/consumer data generates a particularly common sharing pattern that results in a non-uniform distribution of references. In this paper we quantify the performance impact of producer/consumer sharing as a fun...
متن کاملAdaptive Cache Coherency for Detecting Migratory Shared
Parallel programs exhibit a small number of distinct data-sharing patterns. A common data-sharing pattern, migratory access, is characterized by exclusive read and write access by one processor at a time to a shared datum. We describe a family of adaptive cache coherency protocols that dynamically identify migratory shared data in order to reduce the cost of moving them. The protocols use a sta...
متن کاملMeshes vs. Hypercubes: A case study for Distributed Shared-memory Multiprocessors
Distributed shared-memory multiprocessors (DSM) are gaining acceptance because they are easier to program than multicomputers. Recently proposed DSM use a direct interconnection network to access remote memory locations, making these architectures scalable. Most DSMs implement a cache coherence protocol by hardware. This protocol exchanges data and control messages through the interconnection n...
متن کاملAutomatic Generation of Veri able Cache Coherence
Performance modelling and veriication are vital steps in the development cycle of any cache coherency protocol. Two separate models are usually required to perform each analysis step and as protocols become increasingly complex each can become correspondingly unwieldy. We examine how stochastic process algebra can be used to describe cache coherency protocols in such a way as to allow both the ...
متن کاملReducing Controller Contention in Shared-Memory Multiprocessors Using Combining and Two-Phase Routing
In simple cache coherency protocols, serialisation can occur when many simultaneous accesses are made to data held in a single node, and when many accesses involve a common \home" node controller. This is ameliorated in various designs with a hierarchical or clustered structure. In this paper we investigate the idea of routing requests via an intermediate \proxy" node where combining is used to...
متن کامل