Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs
نویسندگان
چکیده
In many-core CMP architectures, the cache coherence protocol is a key component since it can add requirements of area and power consumption to the final design and, therefore, it could restrict severely its scalability. Area constraints limit the use of precise sharing codes to smallor medium-scale CMPs. Power constraints make impractical to use broadcast-based protocols for large-scale CMPs. Token-CMP and DiCo-CMP are cache coherence protocols that have been recently proposed to avoid the indirection problem of traditional directory-based protocols. However, Token-CMP is based on broadcasting requests to all tiles, while DiCo-CMP adds a precise sharing code to each cache entry. In this work, we address the traffic-area trade-off for these indirection-aware protocols. In particular, we propose and evaluate several implementations of DiCo-CMP which differ in the amount of coherence information that they must store. Our evaluation results show that our proposals entail a good traffic-area trade-off by halving the traffic requirements compared to Token-CMP and considerably reducing the area storage required by DiCo-CMP.
منابع مشابه
Evaluation of Low-Overhead Organizations for the Directory in Future Many-Core CMPs
If current trends continue, today’s small-scale general-purpose CMPs will soon be replaced by multi-core architectures integrating tens or even hundreds of cores on-chip. Most likely, some of these many-core CMPs will implement the hardware-managed, implicitly-addressed, coherent caches memory model. Cache coherence in these designs will be probably maintained through a directory-based cache co...
متن کاملConcerning with On-Chip Network Features to Improve Cache Coherence Protocols for CMPs
Chip multiprocessors (CMPs) with on-chip network connecting processor cores have been pervasively accepted as a promising technology to efficiently utilize the ever increasing density of transistors on a chip. Communications in CMPs require invalidating cached copies of a shared data block. The coherence traffic incurs more and more significant overhead as the number of cores in a CMP increases...
متن کاملDirectoryless shared memory architecture using thread migration and remote access
Distributed directory cache coherence protocols for current many-core CMPs are not only difficult and error-prone to implement and verify, but also provide suboptimal performance when a thread requires access to large amounts of data distributed across the chip: the data must be brought to the core where the thread is running, incurring delays and energy costs. In this paper, we propose an appr...
متن کاملA NoC-level Support for Broadcast-based Coherence Protocols
Chip Multiprocessor Systems (CMPs) rely on a cache coherency protocol to maintain memory access coherence between cached data and main memory. The Hammer coherency protocol is appealing as it eliminates most of the space overhead when compared to a directory protocol. However, it generates much more traffic, thus stressing the NoC and having worse performance in terms of power consumption. When...
متن کاملArchitectural Implications of Cache Coherence Protocols with Network Applications on Chip MultiProcessors
Network processors are specialized integrated circuits used to process packets in such network equipment as core routers, edge routers, and access routers. As predicted by Gilder’s law, Internet traffic has doubled each year since 1997 and this trend is showing no signs of abating. Since all emerging network applications which require deep packet classification and security-related processing s...
متن کامل