IPU/LTB: A Method for Reducing E ective Memory Latency

نویسندگان

  • C. Reid
  • Harmon
  • Bill Appelbe
  • Raja Das
چکیده

This paper describes a new hardware approach to data and instruction prefetching for superscalar processors. The key innovation is instruction prefetching by predicting procedural control ow, and decoupling data and instruction prefetching. Simulation results show this method to recover 72% of unnecessarily lost cache cycles and to yield a great improvement (20-27%) over previous hardware prefetching techniques. The technique has a relatively small cost in hardware, and is intended to come between the processor and a level-1 cache.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hardware and Software Mechanisms for Reducing Load Latency

As processor demands quickly outpace memory, the performance of load instructions becomes an increasingly critical component to good system performance. This thesis contributes four novel load latency reduction techniques, each targeting a di erent component of load latency: address calculation, data cache access, address translation, and data cache misses. The contributed techniques are as fol...

متن کامل

Blocking Linear Algebra Codes for Memory Hierarchies

Because computation speed and memory size are both increasing, the latency of memory, in basic machine cycles, is also increasing. As a result, recent compiler research has focused on reducing the e ective latency by restructuring programs to take more advantage of high-speed intermediate memory (or cache, as it is usually called). The problem is that many real-world programs are non-trivial to...

متن کامل

Dynamic Characteristics of Multithreaded Execution in the EM - X Multiprocessor

Multithreading is known be e ective for tolerating communication latency in distributed-memory multiprocessors. Two types of support for multithreading have been used to date including software and hardware. This paper presents the impact of multithreading on performance through empirical studies. In particular, we explicate the performance di erence between software support and hardware suppor...

متن کامل

An E cient Architecture for Loop Based Data Preloading

Cache prefetching with the assistance of an optimizing compiler is an e ective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its bene t can be unpredictable. A new architectural support for preloading, the preload bu er, is proposed in this paper. Unlike previously proposed methods of nonbinding cache ...

متن کامل

Design and Evaluation of a Subblock Cache Coherence Protocol for Bus-Based Multiprocessors

Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory architecture that serves all applications well is not easy. However, because tolerating or reducing memory latency is a priority in e ective parallel processing, it is important to explore new techniques to reduce memory tra c. In this paper, we describe a snoopy cache coherence protocol that uses a la...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997