Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques
نویسندگان
چکیده
ÐThe speed of arithmetic calculations in configurable hardware is limited by carry propagation, even with the dedicated hardware found in recent FPGAs. This paper proposes and evaluates an approach called delayed addition that reduces the carrypropagation bottleneck and improves the performance of arithmetic calculations. Our approach employs the idea used in Wallace trees to store the results in an intermediate form and delay addition until the end of a repeated calculation such as accumulation or dotproduct; this effectively removes carry propagation overhead from the calculation's critical path. We present both integer and floatingpoint designs that use our technique. Our pipelined integer multiply-accumulate (MAC) design is based on a fairly traditional multiplier design, but with delayed addition as well. This design achieves a 72MHz clock rate on an XC4036xla-9 FPGA and 170MHz clock rate on an XV300epq240-8 FPGA. Next, we present a 32-bit floating-point accumulator based on delayed addition. Here, delayed addition requires a novel alignment technique that decouples the incoming operands from the accumulated result. A conservative version of this design achieves a 40 MHz clock rate on an XC4036xla-9 FPGA and 97MHz clock rate on an XV100epq240-8 FPGA. We also present a 32-bit floating-point accumulator design with compiler-managed overflow avoidance that achieves a 80MHz clock rate on an XC4036xla-9 FPGA and 150MHz clock rate on an XCV100epq240-8 FPGA. Index TermsÐDelayed addition, accumulation, multiply-accumulate, MAC, FPGA.
منابع مشابه
Using Delayed Addition Techniques to Accelerate Integer and Floating- Point Calculations in Configurable Hardware
This paper proposes and evaluates an approach for improving the performance of arithmetic calculations via delayed addition. Our approach employs the idea used in Wallace trees to delay addition until the end of a repeated calculation such as accumulation or dot-product; this effectively removes carry propagation overhead from the calculation’s critical path. We present integer and floating-poi...
متن کاملA Configurable Floating-Point Discrete Hilbert Transform Processor for Accelerating the Calculation of Filter in Katsevich Formula
Katsevich formula is currently a hot topic for cone-beam computed tomography (CBCT). The filter in the formula can be computed by a regular discrete Hilbert transform (DHT). A configurable single precision floating-point (SPFP) DHT processor is proposed for accelerating the calculation of filter in Katsevich formula. The configurable processor is of memory based architecture with one pipelined ...
متن کاملA Pipelined, Single Precision Floating-Point Logarithm Computation Unit in Hardware
A large number of scientific applications rely on the computing of logarithm. Thus, accelerating the speed of computing logarithms is significant and necessary. To this end, we present the realization of a pipelined Logarithm Computation Unit (LCU) in hardware that uses lookup table and interpolation techniques. The presented LCU supports single precision arithmetic with fixed accuracy and spee...
متن کاملRapid Implementation of Floating-point Computations Using Phase-Coherent Dynamically Configurable Pipelines
The Phase-Coherent Dynamically Configurable Pipeline is a concept for the rapid implementation of pipelined computational algorithms in configurable hardware. The approach allows a high level of sharing of floating-point resources among multiple computations. The concept features a simple tag-based control scheme and a sparse-pipeline allocation approach that enables all the stages of an arithm...
متن کاملFPGA Optimizations for a Pipelined Floating-Point Exponential Unit
The large number of available DSP slices on new-generation FPGAs allows for efficient mapping and acceleration of floating-point intensive codes. Numerous scientific codes heavily rely on executing the exponential function. To this end, we present the design and implementation of a pipelined CORDIC/TD-based (COrdinate Rotation DIgital Computer/Table Driven) Exponential Approximation Unit (EAU) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Computers
دوره 49 شماره
صفحات -
تاریخ انتشار 2000