instruction fetch

Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

2002

Resit Sendag David J. Lilja Steven R. Kunkel

As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting conditional branches, to exploit more ...

متن کامل

Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption

2002

Weiyu Tang Alexander V. Veidenbaum Alexandru Nicolau Rajesh K. Gupta

In this paper, we present a Branch Target Buuer (BTB) design for energy savings in set-associative instruction caches. We extend the functionality of a BTB by caching way predictions in addition to branch target addresses. Way prediction and branch target prediction are done in parallel. Instruction cache energy savings are achieved by accessing one cache way if the way prediction for a fetch i...

متن کامل

The PEWs microarchitecture: reducing complexity through data-dependence based decentralization

Journal: :Microprocessors and Microsystems - Embedded Hardware Design 1998

Narayan Ranganathan Manoj Franklin

This paper presents a microarchitecture based on exploiting the locality of data dependences for e ciently executing many instructions per cycle. The instruction window is split into multiple hardware units, and the instruction stream is distributed among them in such a way that data dependent instructions are generally allocated to the same unit. The fetch bandwidth of the processor is enhance...

متن کامل

Future Branches { beyond Speculative Execution

1997

Bill Appelbe Reid Harmon Maurizio Vitale Sri Doddapaneni Scott Wills

The performance and hardware complexity of superscalar architectures is hindered by conditional branch instructions. When conditional branches are encountered in a program, the instruction fetch unit must rapidly predict the branch predicate and begin speculatively fetching instructions with no loss of instruction throughput. Speculative execution increases hardware cost, since speculative inst...

متن کامل

Explore Be-Nice Instruction Scheduling in Open64 for an Embedded SMT Processor

2008

Handong Ye Ge Gan Ziang Hu Guang R. Gao Xiaomi An

A SMT processor can fetch and issue instructions from multiple independent hardware threads at every CPU cycle. Therefore, hardware resources are shared among the concurrently-running threads at a very fine grain level, which can increase the utilization of processor pipeline. However, the concurrently-running threads in a SMT processor may interfere with each other and stall the CPU pipeline. ...

متن کامل

Send To ALU Broadcast Results Send To ALU Broadcast Results Time ScheduleWake - Up ScheduleWake - Up Execute Execute

2000

Dana S. Henry Bradley C. Kuszmaul Gabriel H. Loh

Our program benchmarks and simulations of novel circuits indicate that large-window processors are feasible. Using our redesigned superscalar components, a large-window processor implemented in today’s technology can achieve an increase of 10–60% (geometric mean of 31%) in program speed compared to today’s processors. The processor operates at clock speeds comparable to today’s processors, but ...

متن کامل

New Algorithm Improves Branch Prediction: 3/27/95

1995

Linley Gwennap

Intel’s P6 processor (see 090202.PDF) is the first to use a two-level branch-prediction algorithm to improve accuracy. This algorithm, first published by Tse-Yu Yeh and Yale Patt, has the potential to push accuracy well beyond the 90% level achieved by the best processors today. As future processors look to improve performance by increasing the issue rate and/or extending the pipeline depth, th...

متن کامل

Overview of the Pipe Processor Implementation

1991

Matthew K. Farrens Andrew R. Pleszkun

The PIPE processor is an outgrowth of the PIPE Project, a research project at the University of Wisconsin-Madison whose goal was to investigate computer architectures that would be well suited to VLSI implementation. The implemented PIPE processor is a 32-bit pipelined single chip processor with a simplified load-store instruction set, a 5 stage pipeline, a two-cycle ALU, and the following uniq...

متن کامل

Hierarchical Control Prediction: Support for Aggressive Predication

2009

Hadi Esmaeilzadeh Doug Burger

Predication of control edges has the potential advantages of improving fetch bandwidth and reducing branch mispredictions. However, heavily predicated code in out-of-order processors can lose significant performance by deferring resolution of the predicates until they are executed, whereas in nonpredicated code those control arcs would have remained as branches, and would be resolved immediatel...

متن کامل

Multiple Branch Prediction for Wide - Issue Superscalar ∗

1999

Shu-Lin HWANG Che-Chun CHEN

Modern micro-architectures employ superscalar techniques to enhance system performance. Since the superscalar microprocessors must fetch at least one instruction cache line at a time to support high issue rate and large amount speculative executions. There are cases that multiple branches are often encountered in one cycle. And in practical implementation this would cause serious problem while ...

متن کامل