Reducing cache and TLB power by exploiting memory region and privilege level semantics
نویسندگان
چکیده
1383-7621/$ see front matter 2013 Elsevier B.V. A http://dx.doi.org/10.1016/j.sysarc.2013.04.002 q This is an extension of paper ‘‘Reducing L1 Caches Semantics’’, published in International Symposium o Design (ISLPED), July 30 – August 1, 2012. Compared material in this sumission mainly includes: optimiz detailed analysis on the counter-intuitive phenomen by kernel code; detailed analysis on cache occupan accesses with different semantics; evalua tion benchmarks; sensitivity studies including SMT, OoO This work was done when all authors were employed ⇑ Corresponding author. Tel.: +1 5037506614. E-mail address: [email protected] (Z. Fang). The L1 cache in today’s high-performance processors accesses all ways of a selected set in parallel. This constitutes amajor source of energy inefficiency: atmost one of theN fetched blocks can be useful in anN-way set-associative cache. The other N-1 cachelines will all be tag mismatches and subsequently discarded. We propose to eliminate unnecessary associative fetches by exploiting certain software semantics in cache design, thus reducing dynamic power consumption. Specifically, we usememory region information to eliminate unnecessary fetches in the data cache, and ring level information to optimize fetches in the instruction cache.We present a design that is performance-neutral, transparent to applications, and incurs a space overhead of mere 0.41% of the L1 cache. We show significantly reduced cache lookupswith benchmarks including SPEC CPU, SPECjbb, SPECjAppServer, PARSEC, and Apache. For example, for SPEC CPU 2006, the proposed mechanism helps to reduce cache block fetches from the data and instruction caches by an average of 29% and 53% respectively, resulting in power savings of 17% and 35% in the caches, compared to the aggressively clock-gated baselines. 2013 Elsevier B.V. All rights reserved.
منابع مشابه
Estimating Cache and TLB Power in Embedded Processor Using Complete Machine Simulation
In this paper we propose to combine power estimation and optimization technique for Cache and TLB components of embedded system. The power estimation is done at the architectural level using complete machine simulation model, where as, the optimization is done at the circuit level by applying low power design technique to content addressable memory. It has been shown that for accuracy and flexi...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملReduction in Cache Memory Power Consumption based on Replacement Quantity
Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...
متن کاملOptimised Predecessor Data Structures for Internal Memory
We demonstrate the importance of reducing misses in the translation-lookaside bu er (TLB) for obtaining good performance on modern computer architectures. We focus on data structures for the dynamic predecessor problem: to maintain a set S of keys from a totally ordered universe under insertions, deletions and predecessor queries. We give two general techniques for simultaneously reducing cache...
متن کاملMemory Contexts: Supporting Selectable Cache and TLB Contexts
In this paper I argue that in addition to supporting multiple cores, future microprocessor designs should decouple cores from caches and TLBs and support multiple, run-time selectable hardware memory contexts per core. Multiple memory contexts would allow the operating system and other threads to run without polluting each others’ cache and TLB contexts, reduce coherence traffic, and enable bet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Systems Architecture - Embedded Systems Design
دوره 59 شماره
صفحات -
تاریخ انتشار 2013