Implementation of a Streaming Execution Unit
نویسندگان
چکیده
The Complex Streamed Instruction (CSI) set is an ISA extension targeted at multimedia applications. CSI instructions process two-dimensional data streams stored in memory, performing sectioning, data alignment and conversion between different packed data types all in hardware. It has been shown previously that CSI provides significant speedups compared to current media ISA extensions such as MMX and VIS. This paper presents a detailed design of a unit that can execute CSI instructions under the assumption that the unit is interfaced with the L1 data cache. In particular, it is shown that the complex, two-dimensional, address-generation calculations can be performed in a pipelined fashion and implemented using a three-stage pipeline with acceptable delay and hardware cost.
منابع مشابه
Implementation of the direction of arrival estimation algorithms by means of GPU-parallel processing in the Kuda environment (Research Article)
Direction-of-arrival (DOA) estimation of audio signals is critical in different areas, including electronic war, sonar, etc. The beamforming methods like Minimum Variance Distortionless Response (MVDR), Delay-and-Sum (DAS), and subspace-based Multiple Signal Classification (MUSIC) are the most known DOA estimation techniques. The mentioned methods have high computational complexity. Hence using...
متن کاملPentium III Processor Implementation Tradeoffs
This paper discusses the implementation tradeoffs of the Pentium III processor. The Pentium III processor implements a new extension of the IA-32 instruction set called the Internet Streaming Single-Instruction, MultipleData (SIMD) Extensions (Internet SSE). The processor is based on the Pentium Pro processor microarchitecture. The initial development goals for the Pentium III processor were ...
متن کاملClawHMMER: A Streaming HMMer-Search Implementation
The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search...
متن کاملExploiting the Data-level Parallelism in Modern Microprocessors for Neural Network Simulation
Fast SIMD-parallel execution units are available in most modern microprocessors. They provide an internal parallelism degree in the range from 2 to 16 and can accelerate many data-parallel algorithms. In this paper the suitability of ve diierent SIMD units (Intel's MMX and SSE, AMD's 3DNow!, Motorola's AltiVec and Sun's VIS) for the simulation of neural networks is compared. The appropriateness...
متن کاملClawHMMER: A Streaming HMMer-Search Implementation
The proliferation of biological sequence data has motivated the need for an extremely fast probabilistic sequence search. One method for performing this search involves evaluating the Viterbi probability of a hidden Markov model (HMM) of a desired sequence family for each sequence in a protein database. However, one of the difficulties with current implementations is the time required to search...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Systems Architecture
دوره 49 شماره
صفحات -
تاریخ انتشار 2002