Improving Multiple-block Prediction in the Block-based Trace Cache
نویسنده
چکیده
Multiple-block prediction is emerging as a new and exciting research area. Highly accurate multiple-block predictors are essential for wide instruction fetch mechanisms, that will support future generations of microprocessors. The block-based trace cache is a recent proposal for wide instruction fetch. It aligns and stores instructions at the basic block level instead of at the trace level, thus significantly reducing instruction trace storage requirements. This paper investigates a new mechanism, the tree-based multiple-block predictor, that utilizes multiple-branch prediction techniques to improve trace construction in the fill unit. This new fill-time predictor, does not replace but augments the fetch time path-based next-trace predictor, by improving the trace construction heuristic. The tree-based multiple-block predictor utilizes a tree structure that represents all possible paths beginning at a root block, where each tree node is a branch predictor. Both a bimodal scheme and twolevel adaptive schemes are examined for these tree-node branch predictors. Results: The enhanced trace constructor using the tree-based multiple-block predictor improves performance of the SPECint95 benchmarks by 20% over [1]. It is observed that a two-level adaptive predictor outperforms bimodal by 6%. Finally, the block-based trace cache with the enhanced trace constructor improves performance 8% beyond that of perfect branch prediction and outperforms the conventional trace cache, with a perfect predictor, for instruction storage capacities up to 100KB.
منابع مشابه
Performance Limits of Trace Caches
A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically non-contiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-p...
متن کاملPredicting Last - Touch References under Optimal Replacement
Effective cache replacement is becoming an increasingly important issue in cache hierarchy design as large set-associative caches are widely used in high-performance systems. This paper proposes a novel approach to approximate the decisions made by an optimal replacement algorithm (OPT) using last-touch prediction. The central idea is to identify, via prediction, the final reference to a cache ...
متن کاملUniversity Wednesday , 10 May 2000 Trace Cache
Due to unfortunate circumstances this lecture was not scribed, following are several points that I remember were brought up. If anyone has something to add please tell me. In this session we discussed three papers: Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism-describes several enhancements to the original University of Michigan view of the trace cache. Path-Based Nex...
متن کاملUtilizing Block Size Variability to Enhance Instruction Fetch Rate
In the past, instruction fetch speeds have been improved by using cache schemes that capture the actual program flow. In this paper, we elaborate on the architecture and operation of an instruction cache named Variable-Sized Block Cache (VSBC) that also makes use of the dynamic behavior of a program. Current trace-based cache schemes usually have some instructions stored repeatedly; this redund...
متن کاملCritical Issues Regarding the Trace Cache Fetch Mechanism
In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logically contiguous instructions in physically contiguous storage. When a particular basic block is requested, the trace cache can potentially respond with the requested block along with several blocks t...
متن کامل