instruction fetch

A Disciplined Approach to the Development of Platform Architectures

2001

D. I. August K. Keutzer S. Malik A. R. Newton

ion, and thus more compact specification. It also directly provides support for VLIW compilers in terms of clear description of instruction level parallelism. MAD and Expression are similar in their goals of providing support for simulators and compilers for VLIW processors. However, MAD has less redundancy in the description and thus fewer issues of consistency in description. 6.3 Liberty Simu...

متن کامل

Approaching a Smart Sharing of Resources in SMT Processors

2004

Francisco J. Cazorla Enrique Fernandez Alex Ramirez Mateo Valero

SMT processors increase performance by executing instructions from several threads simultaneously. These threads use the processor’s resources better by sharing them, but, at the same time, threads are competing for these resources. The way critical resources are distributed among threads determines the final throughput and also the performance of each individual thread. Currently, the processo...

متن کامل

Reducing Power Consumption of Dedicated Processors Through Instruction Set Encoding

1998

Luca Benini Giovanni De Micheli Alberto Macii Enrico Macii Massimo Poncino

With the increased clock frequency of modern, high-performance processors (over 500 MHz, in some cases), limiting the power dissipation has become the most stringent design target. It is thus mandatory for processor engineers to resort to a large variety of optimization techniques to reduce the power requirements in the hot zones of the chip. In this paper, we focus on the power dissipated by t...

متن کامل

Critical Issues Regarding the Trace Cache Fetch Mechanism

1997

Sanjay Jeram Patel Daniel Holmes

In order to meet the demands of wider issue processors, fetch mechanisms will need to fetch multiple basic blocks per cycle. The trace cache supplies several basic blocks each cycle by storing logically contiguous instructions in physically contiguous storage. When a particular basic block is requested, the trace cache can potentially respond with the requested block along with several blocks t...

متن کامل

Phased Drowsy I-cache with On-demand Wakeup Prediction Policy for High-performance Low-energy Microprocessors

2007

Zhou Hongwei Zhang Chengyi Zhang Minxuan

In this paper, we propose a phased drowsy instruction cache with on-demand wakeup prediction policy (called “phased on-demand policy”) to reduce the leakage and dynamic energy with less performance overhead. As in prior non-phased on-demand policy, an extra stage for wakeup is inserted before the fetch stage in pipeline. The drowsy cache lines are woken up in wakeup stage and the wakeup latency...

متن کامل

Modeling of Trace- and Block-Based Caches

Journal: :Journal of Circuits, Systems, and Computers 2007

Azam Beg Yul Chu

Recent cache schemes, such as trace cache, (fixed-sized) block cache, and variable-sized block cache, have helped improve instruction fetch bandwidth beyond the conventional instruction caches. Traceand block-caches function by capturing the dynamic sequence of instructions. For industry standard benchmarks (e.g., SPEC2000), performance comparison of various configurations of these caches using...

متن کامل

An Effective Bypass Mechanism to Enhance Branch Predictor for SMT Processors

2007

Yongfeng Pan Xiaoya Fan Liqiang He Deli Wang

Unlike traditional superscalar processors, Simultaneous Multithreaded processor can explore both instruction level parallelism and thread level parallelism at the same time. With a same fetch width, SMT fetches instructions from a single thread not so deeply as in traditional superscalar processor. Meanwhile, all the instructions from different threads share the same Function Unites in SMT. All...

متن کامل

Using Conditional Execution to Exploit Instruction Level Concurrency

Journal: :Softw., Pract. Exper. 1995

Rod Adams Sue M. Gray

Multiple-instruction-issue processors seek to improve performance over scalar RISC processors by providing multiple pipelined functional units in order to fetch, decode and execute several instructions per cycle. The process of identifying instructions which can be executed in parallel and distributing them between the available functional units is referred to as instruction scheduling. This pa...

متن کامل

Pipelined Java Virtual Machine Interpreters

2000

Jan Hoogerbrugge Lex Augusteijn

The performance of a Java Virtual Machine (JVM) interpreter running on a very long instruction word (VLIW) processor can be improved by means of pipelining. While one bytecode is in its execute stage, the next bytecode is in its decode stage, and the next bytecode is in its fetch stage. The paper describes how we implemented threading and pipelining by rewriting the source code of the interpret...

متن کامل

Performance Study of a Multithreaded Superscalar Microprocessor

1996

Manu Gulati Nader Bagherzadeh

This paper describes a technique for improving the performance of a superscalar processor through mul-tithreading. The technique exploits the instruction-level parallelism available both inside each individual stream, and across streams. The former is exploited through out-of-order execution of instructions within a stream, and the latter through execution of instructions from diierent streams ...

متن کامل