A parallel arithmetic array for accelerating compute-intensive applications
نویسندگان
چکیده
A parallel arithmetic array processor for accelerating compute-intensive applications in low-power embedded systems is proposed in this study. The proposed flexible hardware architecture enables the fast execution of both control-dominated and compute-centric streaming computation tasks on the same array. Consequently, multiple levels of parallelism can be efficiently exploited. A test chip integrated with two 16×16 array processor cores was implemented in 65 nm CMOS technology. Multi-format video decoding algorithms were mapped on the chip as benchmarks. The proposed architecture achieved a notable 2.8× advantage on performance over an industrial coarse-grained array processor and a 66% performance boost over a state-of-the-art many-core processor. Meanwhile, the energy-efficiency was improved by 15.3× and 1.78×, respectively.
منابع مشابه
Accelerating Mobile Applications at the Network Edge with Software-Programmable FPGAs
Recently, Edge Computing has emerged as a new computing paradigm dedicated for mobile applications for performance enhancement and energy efficiency purposes. Specifically, it benefits today’s interactive applications on power-constrained devices by offloading compute-intensive tasks to the edge nodes which is in close proximity. Meanwhile, Field Programmable Gate Array (FPGA) is well known for...
متن کاملA Massively Parallel Digital Learning Processor
We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of vector processing elements (VPEs) with variable-resolution arithmetic. Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. The memory bandwidth thus scales with the number of VPEs, while the main data fl...
متن کاملFracturable DSP Block for Multi-context Reconfigurable Architectures
Multi-context architectures like NATURE enable low-power applications to leverage fast context switching for improved energy efficiency and lower area footprint. The NATURE architecture incorporates 16-bit reconfigurable DSP blocks for accelerating arithmetic computations, however, their fixed precision prevents efficient re-use in mixed-width arithmetic circuits. This paper presents an improve...
متن کاملNumerical Solutions of Differential Equations on Fpga-enhanced Computers
Numerical Solutions of Differential Equations on FPGA-Enhanced Computers. (May 2007) Chuan He, B.S., Shandong University; M.S., Beijing University of Aeronautics and Astronautics Co-Chairs of Advisory Committee: Dr. Mi Lu Dr. Wei Zhao Conventionally, to speed up scientific or engineering (S&E) computation programs on general-purpose computers, one may elect to use faster CPUs, more memory, syst...
متن کاملDesign and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL
A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of func...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Electronic Express
دوره 11 شماره
صفحات -
تاریخ انتشار 2014