Reducing cache and TLB power by exploiting memory region and privilege level semantics

نویسندگان

  • Zhen Fang
  • Li Zhao
  • Xiaowei Jiang
  • Shih-Lien Lu
  • Ravi Iyer
  • Tong Li
  • Seung Eun Lee
چکیده

1383-7621/$ see front matter 2013 Elsevier B.V. A http://dx.doi.org/10.1016/j.sysarc.2013.04.002 q This is an extension of paper ‘‘Reducing L1 Caches Semantics’’, published in International Symposium o Design (ISLPED), July 30 – August 1, 2012. Compared material in this sumission mainly includes: optimiz detailed analysis on the counter-intuitive phenomen by kernel code; detailed analysis on cache occupan accesses with different semantics; evalua tion benchmarks; sensitivity studies including SMT, OoO This work was done when all authors were employed ⇑ Corresponding author. Tel.: +1 5037506614. E-mail address: [email protected] (Z. Fang). The L1 cache in today’s high-performance processors accesses all ways of a selected set in parallel. This constitutes amajor source of energy inefficiency: atmost one of theN fetched blocks can be useful in anN-way set-associative cache. The other N-1 cachelines will all be tag mismatches and subsequently discarded. We propose to eliminate unnecessary associative fetches by exploiting certain software semantics in cache design, thus reducing dynamic power consumption. Specifically, we usememory region information to eliminate unnecessary fetches in the data cache, and ring level information to optimize fetches in the instruction cache.We present a design that is performance-neutral, transparent to applications, and incurs a space overhead of mere 0.41% of the L1 cache. We show significantly reduced cache lookupswith benchmarks including SPEC CPU, SPECjbb, SPECjAppServer, PARSEC, and Apache. For example, for SPEC CPU 2006, the proposed mechanism helps to reduce cache block fetches from the data and instruction caches by an average of 29% and 53% respectively, resulting in power savings of 17% and 35% in the caches, compared to the aggressively clock-gated baselines. 2013 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Cache and TLB Power in Embedded Processor Using Complete Machine Simulation

In this paper we propose to combine power estimation and optimization technique for Cache and TLB components of embedded system. The power estimation is done at the architectural level using complete machine simulation model, where as, the optimization is done at the circuit level by applying low power design technique to content addressable memory. It has been shown that for accuracy and flexi...

متن کامل

Reduction in Cache Memory Power Consumption based on Replacement Quantity

Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...

متن کامل

Reduction in Cache Memory Power Consumption based on Replacement Quantity

Today power consumption is considered to be one of the important issues. Therefore, its reduction plays a considerable role in developing systems. Previous studies have shown that approximately 50% of total power consumption is used in cache memories. There is a direct relationship between power consumption and replacement quantity made in cache. The less the number of replacements is, the less...

متن کامل

Optimised Predecessor Data Structures for Internal Memory

We demonstrate the importance of reducing misses in the translation-lookaside bu er (TLB) for obtaining good performance on modern computer architectures. We focus on data structures for the dynamic predecessor problem: to maintain a set S of keys from a totally ordered universe under insertions, deletions and predecessor queries. We give two general techniques for simultaneously reducing cache...

متن کامل

Memory Contexts: Supporting Selectable Cache and TLB Contexts

In this paper I argue that in addition to supporting multiple cores, future microprocessor designs should decouple cores from caches and TLBs and support multiple, run-time selectable hardware memory contexts per core. Multiple memory contexts would allow the operating system and other threads to run without polluting each others’ cache and TLB contexts, reduce coherence traffic, and enable bet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Systems Architecture - Embedded Systems Design

دوره 59  شماره 

صفحات  -

تاریخ انتشار 2013