Data Forwarding in Scalable Shared-Memory Multiprocessors1
نویسندگان
چکیده
Scalable shared-memory multiprocessors are often slowed down by long-latency memory accesses. One way to cope with this problem is to use data forwarding to overlap memory accesses with computation. With data forwarding, when a processor produces a datum, in addition to updating its cache, it sends a copy of the datum to the caches of the processors that the compiler identified as consumers of it. As a result, when the consumer processors access the datum, they find it in their caches. This paper addresses two main issues. First, it presents a framework for a compiler algorithm for forwarding. Second, using address traces, it evaluates the performance impact of different levels of support for forwarding. Our simulations of a 32-processor machine show that, on average, a slightly-optimistic support for forwarding speeds up five applications by 50% for large caches and 30% for small caches. For large caches, most read sharing misses can be eliminated, while for small caches, forwarding rarely increases the number of conflict misses. Overall, support for forwarding in shared-memory multiprocessors promises to deliver good application speedups.
منابع مشابه
Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors1
Distributed Shared-Memory (DSM) multiprocessors provide an attractive combination of cost-effective commodity architecture and, thanks to the shared-memory abstraction, relative ease of programming. Unfortunately, it is well known that tuning applications for scalable performance in these machines is time-consuming. To address this problem, programmers use performance monitoring tools. However,...
متن کاملA performance evaluation of cache injection in bus-based shared memory multiprocessors
Bus-based shared memory multiprocessors with private caches and snooping write-invalidate cache coherence protocols are dominant form of smallto medium-scale parallel machines today. In these systems the high memory latency poses the major hurdle in achieving high performance. One way to cope with this problem is to use various techniques for tolerating high memory latency. Software-controlled ...
متن کاملData Prefetching and Data Forwarding in Shared Memory Multiprocessors
This paper studies and compares the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. Two multiprocessor prefetching algorithms are presented and compared. A simple blocked vector prefetching algorithm, considerably less complex than existing software pipelined ...
متن کاملCache Injection on Bus Based Multiprocessors
Software-controlled cache prefetching and data forwarding are widely used techniques for tolerating memory latency in shared memory multiprocessors. However, some previous studies show that cache prefetching is not so effective on bus-based multiprocessors, while the effectiveness of data forwarding has not been explored in this environment, yet. In this paper, a novel technique called cache in...
متن کاملComparing Data Forwarding and Prefetchingfor Communication - Induced Misses in Shared - Memory MPs 1
As the diierence in speed between processor and memory system continues to increase, it is becoming crucial to develop and reene techniques that enhance the eeectiveness of cache hierarchies. Two such techniques are data prefetching and data forwarding. With prefetching, a processor hides the la-tency of cache misses by requesting the data before it actually needs it. With forwarding, a produce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995