A Comparison of Software and Hardware Synchronization Mechanisms for Distributed Shared Memory Multiprocessors
نویسندگان
چکیده
E cient synchronization is an essential component of parallel computing The designers of traditional multiprocessors have included hardware support only for simple operations such as compare and swap and load linked store conditional while high level synchronization primitives such as locks barriers and condition variables have been implemented in software With the advent of directory based distributed shared memory DSM multiprocessors with signi cant exibility in their cache controllers it is worthwhile considering whether this exibility should be used to support higher level synchronization primitives in hardware In particular as part of maintaining data consistency these architectures maintain lists of processors with a copy of a given cache line which is most of the hardware needed to implement distributed locks We studied two software and four hardware implementations of locks and found that hard ware implementation can reduce lock acquire and release times by compared to well tuned software locks In terms of macrobenchmark performance hardware locks reduce appli cation running times by up to on a synthetic benchmark with heavy lock contention and by on a suite of SPLASH benchmarks In addition emerging cache coherence protocols promise to increase the time spent synchronizing relative to the time spent accessing shared data and our study shows that hardware locks can reduce SPLASH execution times by up to if the time spent accessing shared data is small Although the overall performance impact of hardware lock mechanisms varies tremendously depending on the application the added hardware complexity on a exible architecture like FLASH or Avalanche is negligible and thus hardware support for high level synchro nization operations should be provided This work was supported by the Space and Naval Warfare Systems Command SPAWAR and Advanced Research Projects Agency ARPA Communication and Memory Architectures for Scalable Parallel Computing ARPA order B under SPAWAR contract N C
منابع مشابه
Comparative Evaluation of Fine- and Coarse-Grain Approaches for Software Distributed Shared Memory
Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granularity, and the coarse-grain approach based on virtual memory hardware that provides coherence at a p...
متن کاملSystem Software Support for Reducing Memory Latency on Distributed Shared Memory Multiprocessors
This paper overviews results from our recent work on building customized system software support for Distributed Shared Memory Multiprocessors. The mechanisms and policies outlined in this paper are connected with a single conceptual thread: they all attempt to reduce the memory latency of parallel programs by optimizing critical system services, while hiding the complex architectural details o...
متن کاملHardware Supported Synchronization Primitives for Clusters
Parallel architectures with shared memory are well suited to many applications, provided that efficient shared memory access and process synchronization mechanisms are available. When the parallel machine is a cluster with physically distributed memory, software based synchronization mechanisms together with virtual memory infrastructure can implement Software Distributed Shared Memory (S-DSM),...
متن کاملMultigrain Shared Memory Multigrain Shared Memory
Parallel workstations, each comprising a 10-100 processor shared memory machine, promise cost-e ective general-purpose multiprocessing. This thesis explores the coupling of such smallto medium-scale shared memory multiprocessors through software over a local area network to synthesize larger shared memory systems. Multiprocessors built in this fashion are called Distributed Scalable Shared memo...
متن کاملA New Synchronization Scheme for Memory Consistency Model ( Extended Abstract )
Modernistic scalable multiprocessors are mostly built with a distributed-shared memory architecture. Large scale shared memory multiprocessors have long memory latencies for the remote memory access. And these latencies can quickly offset system performance earned from the exploitation of parallelism. In order to improve system performance, we must reduce memory latencies. The useful way for th...
متن کامل