Library Cache Coherence
نویسندگان
چکیده
Directory-based cache coherence is a popular mechanism for chip multiprocessors and multicores. The directory protocol, however, requires multicast for invalidation messages and the collection of acknowledgement messages, which can be expensive in terms of latency and network traffic. Furthermore, the size of the directory increases with the number of cores. We present Library Cache Coherence (LCC), which requires neither broadcast/multicast for invalidations nor waiting for invalidation acknowledgements. A library is a set of timestamps that are used to auto-invalidate shared cache lines, and delay writes on the lines until all shared copies expire. The size of library is independent of the number of cores. By removing the complex invalidation process of directorybased cache coherence protocols, LCC generates fewer network messages. At the same time, LCC also allows reads on a cache block to take place while a write to the block is being delayed, without breaking sequential consistency. As a result, LCC has 1.85X less average memory latency than a MESI directory-based protocol on our set of benchmarks, even with a simple timestamp choosing algorithm; moreover, our experimental results on LCC with an ideal timestamp scheme (though not implementable) show the potential of further improvement for LCC with more sophisticated timestamp schemes.
منابع مشابه
Multi-Threading Performance on Commodity Multi-Core Processors
Multi-core processors based commodity servers recently become building blocks for high performance computing Linux clusters. The multi-core processors deliver better performance-to-cost ratios relative to their single-core predecessors through on-chip multi-threading. However, they present challenges in developing high performance multi-threaded code. In this paper we study the performance of d...
متن کاملA Software Approach to Unifying Multicore Caches
Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The onchip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can lead to needless slow DRAM accesses. First, data accessed from many cores may be duplicated in many c...
متن کاملExploiting Cache Aanity in Software Cache Coherence Exploiting Cache Aanity in Software Cache Coherence
Cache aanity is important to the performance of scalable shared memory multipro-cessors. For multiprocessors without hardware cache coherence support, software cache coherence is the only alternative. Most existing software cache schemes ignore cache aanity across parallel loops. In this paper, we propose a new scheme, Cache AAnity-based Software cache coherence scheme (CAS), that exploits cach...
متن کاملParameterized Cache Coherence Protocol Verification using Invariant
Verification of parameterized cache coherence protocol is very important in the share-memory multiprocessor system. In this paper, a new method was proposed to verify the correctness of parameterized cache coherence protocol based on the invariant. Firstly, we present the parameterized cache coherence protocol as semi-algebraic transition system, and then solve the invariant of transition syste...
متن کاملCache Coherence Scaling on Manycore Systems
On-Chip cache coherence is in widespread use on mainstream general-purpose computers nowadays. Scaling from multi to many core systems a hardware coherent design might become problematic. This paper will discuss and evaluate different approaches for cache coherence implementations in many core systems and whether it hardware coherence can stay or not.
متن کامل