A Performance Comparison of Sort and Scan Libraries for GPUs
نویسنده
چکیده
Sorting and scanning are two fundamental primitives for constructing highly parallel algorithms. A number of libraries now provide implementations of these primitives for GPUs, but there is relatively little information about the performance of these implementations. We benchmark seven libraries for 32-bit integer scan and sort, and sorting 32-bit values by 32-bit integer keys. We show that there is a large variation in performance between the libraries, and that no one library has both optimal performance and portability.
منابع مشابه
Efficient Primitives and Algorithms for Many-core architectures
OF THE DISSERTATION Efficient Primitives and Algorithms for Many-core architectures Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their programmability has been harnessed to solve non-graphics tasks—in many cases at a huge performance advantage to CPUs. Unlike CPUs, GPUs have always been a highly parallel architecture—thousands of lightweight execution ...
متن کاملAutotuning dense linear algebra libraries on GPUs
As GPUs are quickly evolving in complexity, tuning numerical libraries for them is becoming more challenging. We present an autotuning approach in the area of dense linear algebra (DLA) libraries for GPUs. The MAGMA library is used to demonstrate the techniques and their effect on performance and portability across hardware systems. We show that, figuratively speaking, our autotuning approach f...
متن کاملSorting On A Graphics Processing Unit(GPU)
2.1 Graphics Processing Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.2 Sorting Numbers on GPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 2.2.1 SDK Radix Sort Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.1.1 Step 1–Sorting tiles ...
متن کاملInvestigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملAccelerating high-order WENO schemes using two heterogeneous GPUs
A double-GPU code is developed to accelerate WENO schemes. The test problem is a compressible viscous flow. The convective terms are discretized using third- to ninth-order WENO schemes and the viscous terms are discretized by the standard fourth-order central scheme. The code written in CUDA programming language is developed by modifying a single-GPU code. The OpenMP library is used for parall...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Processing Letters
دوره 25 شماره
صفحات -
تاریخ انتشار 2015