نتایج جستجو برای: instruction fetch

تعداد نتایج: 42508  

2000
Karthik Swaminathan

Introduction:..................................................................................2 Architecture Design:........................................................................................3 1. Support for two Operating System Environments: ............................................................................. 3 2. Ability to handle IA-32 Instruction sets in the IA-64 ope...

Journal: :IEEE Transactions on Circuits and Systems I-regular Papers 2023

Computationally intensive algorithms such as Deep Neural Networks (DNNs) are becoming killer applications for edge devices. Porting heavily data-parallel on resource-constrained and battery-powered devices while retaining the flexibility granted by instruction processor-based architectures poses several challenges related to memory footprint, computational throughput, energy efficiency. Low-bit...

2003
Tomohiro Yoneda Chris Myers

This work proposes an efficient methodology to synthesize timed circuits from high level specification languages. In particular, this paper presents a systematic procedure for translating channel-level models to time Petri net descriptions. Care is taken in this translation to guarantee that there are no state coding violations in the resulting nets greatly simplifying the synthesis process. Th...

2007
Jeff Scott John Arends Bill Moyer

The M•CORETM RISC architecture has been developed to address the growing need for long battery life among today’s embedded applications [4]. In this paper, we present several architectural enhancements to the M•CORE M3 processor. Specifically, we discuss the burst mode memory enhancements, the instruction fetch enhancements, the selectable branch prediction implementation, and the improvements ...

2000
Jochen Kreuzinger Matthias Pfeffer A. Schulz Theo Ungerer Uwe Brinkschulte C. Krakowski

We propose handling of external real time events through multithreading and describe the microarchitecture of our multithreaded Java microcon troller called Komodo microcontroller Real time Java threads are used as interrupt service threads ISTs instead of interrupt service routines ISRs Our proposed Komodo microcontroller supports multiple ISTs with zero cycle context switching overhead We eva...

1995
Maged M. Michael Michael L. Scott

In this paper we consider several hardware implementations of the general-purpose atomic primitives fetch and Φ, compare and swap, load linked, and store conditionalon large-scale shared-memory multiprocessors. These primitives have proven popular on small-scale bus-based machines, but have yet to become widely available on large-scale, distributed shared memory machines. We propose several alt...

2007
Steve Bennett Bin Wang

To continue microprocessor performance improvements made in the last 2 decades, instruction-level parallelism must be exploited across multiple basic block boundaries. This necessity has led to execution engines which dynamically predict a stream of instructions which are executed concurrently. As issue widths increase, former assumptions about requirements for execution resources such as inter...

2000
Alex Ramírez Josep-Lluís Larriba-Pey Mateo Valero

The objective of this paper is to improve the use of the hardware resources of the trace cache mechanism, reducing the implementation cost with no performance degradation. We achieve that by eliminating the replication of traces between the instruction cache and the trace cache. As we show, the trace cache mechanism is generating a high degree of redundancy between the traces stored in the trac...

1998
Antonio González Jordi Tubella Carlos Molina

This paper presents a study of the performance limits of data value reuse. Two types of data value reuse are considered: instruction-level reuse and trace-level reuse. The former reuses instances of single instructions whereas the latter reuses sequences of instructions as an atomic unit. Two different scenarios are considered: an infinite resource machine and a machine with a limited instructi...

2004
Muhamed F. Mudawar John R. Wani

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment, as all the cache blocks in the upper level caches are contained in the lower...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید