instruction fetch

Overview of the IA-64 Architecture

2000

Karthik Swaminathan

Introduction:..................................................................................2 Architecture Design:........................................................................................3 1. Support for two Operating System Environments: ............................................................................. 3 2. Ability to handle IA-32 Instruction sets in the IA-64 ope...

متن کامل

Dustin: A 16-Cores Parallel Ultra-Low-Power Cluster With 2b-to-32b Fully Flexible Bit-Precision and Vector Lockstep Execution Mode

Journal: :IEEE Transactions on Circuits and Systems I-regular Papers 2023

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, energy efficiency. Low-bit...

متن کامل

Synthesizing Timed Circuits from High Level Specification Languages

2003

Tomohiro Yoneda Chris Myers

This work proposes an efficient methodology to synthesize timed circuits from high level specification languages. In particular, this paper presents a systematic procedure for translating channel-level models to time Petri net descriptions. Care is taken in this translation to guarantee that there are no state coding violations in the resulting nets greatly simplifying the synthesis process. Th...

متن کامل

Complexity-effective Enhancements to a RISC CPU Architecture

2007

Jeff Scott John Arends Bill Moyer

The M•CORETM RISC architecture has been developed to address the growing need for long battery life among today’s embedded applications [4]. In this paper, we present several architectural enhancements to the M•CORE M3 processor. Specifically, we discuss the burst mode memory enhancements, the instruction fetch enhancements, the selectable branch prediction implementation, and the improvements ...

متن کامل

Performance Evaluations of a Multithreaded Java Microcontroller

2000

Jochen Kreuzinger Matthias Pfeffer A. Schulz Theo Ungerer Uwe Brinkschulte C. Krakowski

We propose handling of external real time events through multithreading and describe the microarchitecture of our multithreaded Java microcon troller called Komodo microcontroller Real time Java threads are used as interrupt service threads ISTs instead of interrupt service routines ISRs Our proposed Komodo microcontroller supports multiple ISTs with zero cycle context switching overhead We eva...

متن کامل

Implementation of Atomic Primitives on Distributed Shared Memory Multiprocessors

1995

Maged M. Michael Michael L. Scott

In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and Φ, compare and swap, load linked, and store conditionalon large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bus-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alt...

متن کامل

Very-Wide-Issue Superscalar Microengine Configurations

2007

Steve Bennett Bin Wang

To continue microprocessor performance improvements made in the last 2 decades, instruction-level parallelism must be exploited across multiple basic block boundaries. This necessity has led to execution engines which dynamically predict a stream of instructions which are executed concurrently. As issue widths increase, former assumptions about requirements for execution resources such as inter...

متن کامل

Trace Cache Redundancy: Red & Blue Traces

2000

Alex Ramírez Josep-Lluís Larriba-Pey Mateo Valero

The objective of this paper is to improve the use of the hardware resources of the trace cache mechanism, reducing the implementation cost with no performance degradation. We achieve that by eliminating the replication of traces between the instruction cache and the trace cache. As we show, the trace cache mechanism is generating a high degree of redundancy between the traces stored in the trac...

متن کامل

The Performance Potential of Data Value Reuse

1998

Antonio González Jordi Tubella Carlos Molina

This paper presents a study of the performance limits of data value reuse. Two types of data value reuse are considered: instruction-level reuse and trace-level reuse. The former reuses instances of single instructions whereas the latter reuses sequences of instructions as an atomic unit. Two different scenarios are considered: an infinite resource machine and a machine with a limited instructi...

متن کامل

One-Level Cache Memory Design for Scalable SMT Architectures

2004

Muhamed F. Mudawar John R. Wani

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment, as all the cache blocks in the upper level caches are contained in the lower...

متن کامل