Design of Trace Caches for High Bandwidth Instruction Fetching
نویسنده
چکیده
In modern high performance microprocessors, there has been a trend toward increased superscalarity and deeper speculation to extract instruction level parallelism. As issue rates rise, more aggressive instruction fetch mechanisms are needed to be able to fetch multiple basic blocks in a given cycle. One such fetch mechanism that shows a great deal of promise is the trace cache, originally proposed by Rotenburg, et. al. In this thesis, critical design issues regarding the trace cache fetch mechanism are explored in order to develop techniques to further improve trace cache performance. The thesis research presents an optimized trace cache design that show an average 34.9% improvement for integer benchmarks and 11.0% improvement for floating-point benchmarks, relative to the originally proposed trace cache design. This corresponds to a 67.9% and 16.3% improvement in fetch bandwidth over a traditional instruction cache, for integer and floating-point benchmarks respectively. The results demonstrate the viability of the trace cache as a high performance fetch mechanism and provide justification for additional research . Thesis Supervisor: Arvind Title: Professor of Computer Science and Engineering
منابع مشابه
Out-of-Order Instruction Fetch Using Multiple Sequencers
Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide ...
متن کاملUniversity Wednesday , 10 May 2000 Trace Cache
Due to unfortunate circumstances this lecture was not scribed, following are several points that I remember were brought up. If anyone has something to add please tell me. In this session we discussed three papers: Alternative Fetch and Issue Policies for the Trace Cache Fetch Mechanism-describes several enhancements to the original University of Michigan view of the trace cache. Path-Based Nex...
متن کاملA Smart Cache for Improved Vector Performance
As the speed of microprocessors increases at a breath-taking rate, the gap between processor and memory system performance is getting worse. To alleviate this problem, all modern processors contain caches, but even using caches, processors cannot achieve their peak performance. We propose a mechanism, smart caching, which extends the power of conventional memory subsystems by including a prefet...
متن کاملA Trace Cache Microarchitecture and Evaluation
As the instruction issue width of superscalar processors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. Trace caches overcome this limitation by caching tra...
متن کاملInstruction Cache Designs for a Class of Statically Scheduled Instruction Level Parallel Architectures
Statically-scheduled architectures such as very long instruction word (VLIW) architectures use very wide instruction words in conjunction with high bandwidth to the instruction cache to achieve multiple instruction issue. The encoding used for the instructions can have an e ect on the requirements placed on the instruction fetch and instruction cache hardware. One type of encoding is a compress...
متن کامل