Enhancements for Accurate and Timely Streaming Prefetcher

نویسندگان

  • Gang Liu
  • Zhuo Huang
  • Jih-Kwon Peir
  • Xudong Shi
  • Lu Peng
چکیده

In this paper, we describe several enhancement techniques to improve the state-of-the-art stream prefetcher. First, the enhanced stream prefetcher takes streams with long stride into consideration to avoid wasteful prefetches. Second, accessing a node in a tree or graph structure may have a different direction than the traversal direction through the structure. The enhanced stream prefetcher eliminates this type of noise for establishing the stream. Third, regular streams for array accesses are often repeated. Initiating penalty can be avoided by early re-establishing a repeated stream. Fourth, an established stream may be dead before being removed from the stream prefetching table. A dead stream removal scheme reduces inaccurate prefetches. Performance evaluations based on SPEC applications show that the enhanced stream prefetcher improves 38%, 42%, and 55% of CPI for the three tested cache configurations provided by the 1 JILP Data Prefetching Championship Committee [19] with respect to the base design without prefetching. In comparison with the original stream prefetcher, the improvements are 2%, 18%, and 19% respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Branch-directed Data Cache Prefetching Technique for Inorder Processors

A Branch-directed Data Cache Prefetching Technique for Inorder Processors. (December 2011) Reena Panda, B.Tech, NIT Rourkela, India Co-Chairs of Advisory Committee: Dr. Paul V. Gratz Dr. Jiang Hu The increasing gap between processor and main memory speeds has become a serious bottleneck towards further improvement in system performance. Data prefetching techniques have been proposed to hide the...

متن کامل

Performance Oriented Prefetching Enhancements Using Commit Stalls

Loads that miss in L1 or L2 caches, and are waiting for their data at the head of the ROB, cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us id...

متن کامل

An Observed Study on Improved Caching by Adaptive and Partial Aggressive Prefetching

This paper gives the observed study to investigate the advantages over adaptive prefetching with proxy caching for large multimedia streaming. The adaptive and partial prefetching method fetch the media chunks dynamically based on the user access pattern in the proxy servers and updates the current access pattern in to media server. The study analyzed the proxy caching study of iRcache and appl...

متن کامل

A Best-Offset Prefetcher

The Best-Offset (BO) prefetcher submitted to the DPC2 contest prefetches one line into the level-two (L2) cache on every cache miss or hit on a prefetched line. The prefetch line address is generated by adding an offset to the demand access address. The BO prefetcher tries to find automatically an offset value that yields timely prefetches with the highest possible coverage and accuracy. It eva...

متن کامل

Towards Memory Prefetching with Neural Networks: Challenges and Insights

Accurate memory prefetching is paramount for processor performance, and modern processors employ various techniques to identify and prefetch different memory access patterns. While most modern prefetchers target spatio-temporal patterns by matching memory addresses that are accessed in close proximity (either in space or time), the recently proposed concept of semantic locality views locality a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Instruction-Level Parallelism

دوره 13  شماره 

صفحات  -

تاریخ انتشار 2011