Dynamic Last-Level Cache Allocation to Reduce Area and Power Overhead in Directory Coherence Protocols
نویسندگان
چکیده
Last level caches (LLC) play an important role in current and future chip multiprocessors, since they constitute the last opportunity to avoid expensive off-chip accesses. In a tiled CMP, the LLC is typically shared by all cores but physically distributed along the chip, thus providing a global banked capacity memory with high associativity. The memory hierarchy is orchestrated through a directory-based coherence protocol, typically associated to the LLC banks. The LLC (and directory structure) occupies a significant chip area and has a large contribution on the global chip leakage energy. To counter measure these effects, we provide in this paper a reorganization of the LLC cache and the directory by decoupling tag and data entry allocation, and by exploiting the high percentage of private data typically found in CMP systems. Private blocks are kept in L1 caches whereas LLC area is reorganized to reduce L2 entries while still allowing directory entries for private data, thus, maximizing on-chip memory reuse. This is achieved with no performance drop in terms of execution time. Evaluation results demonstrate a negligible impact on performance while achieving 45% of area saving and 75% of static power saving. For more aggressive designs, we achieve 80% area and 82% static power savings, while impacting performance by 10%.
منابع مشابه
Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This info...
متن کاملADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
ÐDirectories have been used to maintain cache coherency in shared memory multiprocessors with private caches. The traditional full map directory tracks the exact caching status for each shared memory block and is designed to be efficient and simple. Unfortunately, the inherent directory size explosion makes it unsuitable for large-scale multiprocessors. In this paper, we propose a new directory...
متن کاملArchitectural Support for an Efficient Implementation of a Software-Only Directory Cache Coherence Protocol
Software-only directory cache coherence protocols emulate directory management by handlers executed on the compute processor in shared-memory multiprocessors. While their potential lies in lower implementation cost and complexity than traditional hardware-only directory protocols, the miss penalty for cache misses induced by application data accesses as well as directory accesses is a critical ...
متن کاملA scalable organization for distributed directories
Although directory-based cache-coherence protocols are the best choice when designing chip multiprocessors with tens of cores on-chip, the memory overhead introduced by the directory structure may not scale gracefully with the number of cores. Many approaches aimed at improving the scalability of directories have been proposed. However, they do not bring perfect scalability and usually reduce t...
متن کاملPhotonic Architectures for Distributed Shared Memory Multiprocessors
This paper studies the interaction between the access protocol used to provide arbitration for a wavelengthdivision multiple access photonic network and the cache coherence protocol required to support a distributed shared memory environment. The architecture is based on wavelength division multiplexing which enables multiple multi-access channels to be realized on a single optical fiber. Large...
متن کامل