Performance Tradeoff Spectrum of Integer and Floating Point Applications Kernels on Various GPUs
نویسنده
چکیده
Floating point precision and performance and the ratio of floating point units to integer processing elements on a graphics processing unit accelerator all continue to present complex tradeoffs for optimising core utilisation on modern devices. We investigate various hybrid CPU and GPU combinations using a range of different GPU models occupying different points in this tradeoff space. We analyse some performance data for a range of numerical simulation kernels and discuss their use as benchmark problems for characterising such devices.
منابع مشابه
Tradeoff of FPGA Design of a Floating-point Library for Arithmetic Operators
Most of the engineering and scientific applications involve the implementation of complex algorithms that are based on arithmetic operators [1]. The fixed-point arithmetic allows the computations to be performed with a high precision according to the bitwidth representation. However, many applications require to work not only with a high precision, but also with a suitable format in order to re...
متن کاملBranch Write Back SRU FPU 3 FPU 2 FPU 1 FX
This paper presents the performance of DSP, image and 3D applications on recent general-purpose microprocessors using streaming SIMD ISA extensions (integer and oating point). The 9 benchmarks benchmark we use for this evaluation have been optimized for DLP and caches use with SIMD extensions and data prefetch. The result of these cumulated optimizations is a speedup that ranges from 1.9 to 7.1...
متن کاملFactorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures
This paper presents new algorithmic approaches and optimization techniques for LU factorization and matrix inversion of millions of very small matrices using GPUs. These problems appear in many scientific applications including astrophysics and generation of block-Jacobi preconditioners. We show that, for very small problem sizes, design and optimization of GPU kernels require a mindset differe...
متن کاملAn Improved MAGMA GEMM for Fermi GPUs
We present an improved matrix-matrix multiplication routine (GEMM) in the MAGMA BLAS library that targets the Fermi GPUs. We show how to modify the previous MAGMA GEMM kernels in order to make a more efficient use of the Fermi’s new architectural features, most notably their extended memory hierarchy and sizes. The improved kernels run at up to 300 GFlop/s in double and up to 600 GFlop/s in sin...
متن کاملDesign Heuristics for Mapping Floating-Point Scientific Computational Kernels onto High Performance Reconfigurable Computers
Because of the increasing need to develop efficient high-speed computational kernels, researchers have been looking at various acceleration technologies. One approach is to use field programmable gate arrays (FPGAs) in conjunction with general purpose processors to form what are known as high performance reconfigurable computers (HPRCs). HPRCs have already been shown to work well for both fixed...
متن کامل