Nonuniform Cache Architectures for Wire-Delay Dominated On-Chip Caches
نویسندگان
چکیده
0272-1732/03/$17.00 2003 IEEE Published by the IEEE computer Society The next generation of today’s highperformance processors incorporate large leveltwo caches on the processor die. For example, the IBM Power5 will contain a 1.92-Mbyte L2 cache, the Hewlett-Packard PA8700 will contain 2.25 Mbytes of unified on-chip cache, and the Intel Itanium2 will contain 6 Mbytes of on-chip L3 cache. Cache sizes will continue to increase as bandwidth demands on the package grow, and as smaller technologies permit more bits per square millimeter. However, increasing global wire delays across the chip will make large on-chip caches with a single, discrete hit latency undesirable in future technologies. Data residing near the processor in a large cache is much more quickly accessible than data residing far from the processor. Accessing the closest bank in a 16-Mbyte, onchip L2 cache built in a 50-nm process technology, for example, could take four cycles, whereas accessing the farthest bank might take 47 cycles. The bulk of the access time involves routing to and from the banks rather than the bank accesses themselves. Nonuniform cache access (NUCA) designs address this wire-delay problem. In this approach, a switched network allows data to migrate to different cache regions according to access frequency—that is, frequently accessed data migrates to areas closer to the processor. We propose several designs that treat the cache as a network of banks and facilitate nonuniform accesses to different physical regions. NUCA architectures offer low-latency access, increased scalability, and greater performance stability than conventional uniform access cache architectures.
منابع مشابه
Designs Solve the on - Chip Wire Delay Problem for Future Large Integrated Caches . by Embedding a Network in the Cache , Nuca Designs Let Data Migrate
0272-1732/03/$17.00 2003 IEEE Published by the IEEE computer Society The next generation of today’s highperformance processors incorporate large leveltwo caches on the processor die. For example, the IBM Power5 will contain a 1.92-Mbyte L2 cache, the Hewlett-Packard PA8700 will contain 2.25 Mbytes of unified on-chip cache, and the Intel Itanium2 will contain 6 Mbytes of on-chip L3 cache. Cach...
متن کاملOn-Chip Networks: Impact on the Performance of NUCA Caches
Non Uniform Cache Architectures (NUCA) are a new design paradigm for large last-level on-chip caches and have been introduced to deliver low access latencies in wire-delay dominated environments. Their structure is partitioned into sub-banks and the resulting access latency is a function of the physical position of the requested data. Typically, NUCA caches make use of a switched network to con...
متن کاملReducing Sensitivity to NoC Latency in NUCA Caches
Non Uniform Cache Architectures (NUCA) are a novel design paradigm for large last-level on-chip caches which have been introduced to deliver low access latencies in wire-delay dominated environments. Typically, NUCA caches make use of a network-on-chip (NoC) to connect the different sub-banks and the cache controller. This work analyzes how different network parameters, namely hop latency and b...
متن کاملNUCA: A Non-Uniform Cache Access Architecture for Wire-Delay Dominated On-Chip Caches
This paper describes Non-Uniform Cache Access (NUCA) designs, which solve the on-chip wire delay problem for future large integrated caches. These designs embed a network into the cache itself, allowing data to migrate within the cache, clustering the working set in the cache region nearest to the processor. Today’s high performance processors incorporate large level-two (L2) caches on the proc...
متن کاملCACTI 6.0: A Tool to Model Large Caches
© CACTI 6.0: A Tool to Model Large Caches Naveen Muralimanohar, Rajeev Balasubramonian, Norman P. Jouppi HP Laboratories HPL-2009-85 No keywords available. Future processors will likely have large on-chip caches with a possibility of dedicating an entire die for on-chip storage in a 3D stacked design. With the ever growing disparity between transistor and wire delay, the properties of such larg...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Micro
دوره 23 شماره
صفحات -
تاریخ انتشار 2003