Energy-efficient algebra kernels in FPGA for High Performance Computing
نویسندگان
چکیده
The dissemination of multi-core architectures and the later irruption massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms last decades. As result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as versatile more energy-efficient alternative other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such VHDL or Verilog, which follow an entirely different programming model than standard software languages, their use requires specialized knowledge underlying hardware. In years, manufacturers started make big efforts provide High-Level Synthesis (HLS) tools, order allow grater adoption FPGAs HPC community.Our work studies hardware address Numerical Linear Algebra (NLA) kernels general matrix multiplication GEMM sparse matrix-vector SpMV. Specifically, we compare behavior fine-tuned CPU processor HLS implementations on FPGAs. We perform experimental evaluation our low-end cutting-edge platform, terms runtime energy consumption, results against Intel MKL library CPU.
منابع مشابه
High-Performance Linear Algebra Processor using FPGA
With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing. FPGA provides programmable hardware that can be used to design custom hardware without the high-cost of traditional hardware design. In this talk we discuss two multi-proc...
متن کاملPAM-Blox: High Performance FPGA Design for Adaptive Computing
PAM-Blox are object-oriented circuit generators on top of the PCI Pamette design environment, PamDC. Highperformance FPGA design for adaptive computing is simplified by using a hierarchy of optimized hardware objects described in C++. PAM-Blox consist of two major layers of abstraction. First, PamBlox are parameterizable simple elements such as counters and adders. Automatic placement of carry ...
متن کاملIANUS: an FPGA-based System for High Performance Scientific Computing
This paper describes IANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each IANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the IANUS host, a conventional PC. IANUS is tailored for, but not...
متن کاملEnergy-Efficient FPGA-Based Parallel Quasi-Stochastic Computing
The high performance of FPGA (Field Programmable Gate Array) in image processing applications is justified by its flexible reconfigurability, its inherent parallel nature and the availability of a large amount of internal memories. Lately, the Stochastic Computing (SC) paradigm has been found to be significantly advantageous in certain application domains including image processing because of i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of computer science and technology
سال: 2021
ISSN: ['1666-6046', '1666-6038']
DOI: https://doi.org/10.24215/16666038.21.e09