A Decoupled Predictor-Directed Stream Prefetching Architecture
نویسندگان
چکیده
Abstract An effective method for reducing the effect of load latency in modern processors is data prefetching. One form of hardware-based data prefetching, stream buffers, has been shown to be particularly effective due to its ability to detect data streams and run ahead of them, prefetching as it goes. Unfortunately, in the past, the applicability of streaming was limited to stride intensive code. In this paper we propose Predictor-Directed Stream Buffers (PSB), which allows the stream buffer to follow a general address prediction stream instead of a fixed stride. A general address prediction stream complicates the allocation of both stream buffer and memory resources, because the predictions generated will not be as reliable as prior sequential next-line and stride-based stream buffer implementations. To address this, we examine using confidence-based techniques to guide the allocation and prioritization of stream buffers and their prefetch requests. Our results show, when using PSB on a benchmark suite heavy in pointer-based applications, that PSB provides a 23% speedup on average over the best previous stream buffer implementation, and an improvement of 75% over using no prefetching at all.
منابع مشابه
Optimizations Enabled by a Decoupled Front-End Architecture
ÐIn the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictio...
متن کاملBranch-directed and pointer-based data cache prefetching
The design of the on-chip cache memory and branch prediction logic has become an integral part of a microprocessor implementation. Branch predictors reduce the effects of control hazards on pipeline performance. Branch prediction implementations have been proposed which eliminate a majority of the pipeline stalls associated with branches. Caches are commonly used to reduce the performance gap b...
متن کاملHiDISC: A Decoupled Architecture for Applications in Data Intensive Computing
The ever growing speed gap between processor and main memory has been a major performance bottleneck of modern computer systems. As a result, today’s data intensive applications suffer from frequent cache misses and lose many CPU cycles due to pipeline stalling. Although traditional prefetching methods reduce cache misses considerably, most of them strongly depend on the access pattern being pr...
متن کاملThe Decoupled-Style Prefetch Architecture
Decoupled processing seeks to dynamically schedule memory accesses in order to tolerate memory latency by prefetching operands. Since decoupled processors can not speculatively issue memory operations, control flow operations can significantly impact their ability to prefetch data. The prefetching architecture proposed here seeks to leverage the dynamic scheduling benefits of decoupled processi...
متن کاملSecond-level Cache Organization for Data Prefetching
This paper studies hardware prefetching for second-level (L2) caches. Previous work on prefetching has been extensive but largely directed at primary caches. In some cases only L2 prefetching is possible or is more appropriate. We concentrate on stride-directed prefetching and study stream buuers and L2 cache prefetching. We show that proposed stride-directed organizations/prefetching algorithm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Computers
دوره 52 شماره
صفحات -
تاریخ انتشار 2003