IPU/LTB: A Method for Reducing E ective Memory Latency
نویسندگان
چکیده
This paper describes a new hardware approach to data and instruction prefetching for superscalar processors. The key innovation is instruction prefetching by predicting procedural control ow, and decoupling data and instruction prefetching. Simulation results show this method to recover 72% of unnecessarily lost cache cycles and to yield a great improvement (20-27%) over previous hardware prefetching techniques. The technique has a relatively small cost in hardware, and is intended to come between the processor and a level-1 cache.
منابع مشابه
Hardware and Software Mechanisms for Reducing Load Latency
As processor demands quickly outpace memory, the performance of load instructions becomes an increasingly critical component to good system performance. This thesis contributes four novel load latency reduction techniques, each targeting a di erent component of load latency: address calculation, data cache access, address translation, and data cache misses. The contributed techniques are as fol...
متن کاملBlocking Linear Algebra Codes for Memory Hierarchies
Because computation speed and memory size are both increasing, the latency of memory, in basic machine cycles, is also increasing. As a result, recent compiler research has focused on reducing the e ective latency by restructuring programs to take more advantage of high-speed intermediate memory (or cache, as it is usually called). The problem is that many real-world programs are non-trivial to...
متن کاملDynamic Characteristics of Multithreaded Execution in the EM - X Multiprocessor
Multithreading is known be e ective for tolerating communication latency in distributed-memory multiprocessors. Two types of support for multithreading have been used to date including software and hardware. This paper presents the impact of multithreading on performance through empirical studies. In particular, we explicate the performance di erence between software support and hardware suppor...
متن کاملAn E cient Architecture for Loop Based Data Preloading
Cache prefetching with the assistance of an optimizing compiler is an e ective means of reducing the penalty of long memory access time beyond the primary cache. However, cache prefetching can cause cache pollution and its bene t can be unpredictable. A new architectural support for preloading, the preload bu er, is proposed in this paper. Unlike previously proposed methods of nonbinding cache ...
متن کاملDesign and Evaluation of a Subblock Cache Coherence Protocol for Bus-Based Multiprocessors
Parallel applications exhibit a wide variety of memory reference patterns. Designing a memory architecture that serves all applications well is not easy. However, because tolerating or reducing memory latency is a priority in e ective parallel processing, it is important to explore new techniques to reduce memory tra c. In this paper, we describe a snoopy cache coherence protocol that uses a la...
متن کامل