GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
نویسندگان
چکیده
Many high performance computing applications require computing both sparse matrix-vector product (SMVP) and sparse matrix-transpose vector product (SMTVP) for better overall performance. Under such a circumstance, it is critical to maintain a similarly high throughput for these two computing patterns with the underlying sparse matrix encoded in a single storage format. The compressed sparse block (CSB) format proposed by Buluç et al. allows computing both problems on multi-core CPUs with nearly identical throughputs. On the other hand, a direct porting of CSB to graphics processing units (GPUs), which have been recently recognized as a powerful general purpose computing platform, turns out to be inefficient. In this work, we propose a new data structure, designated as expanded CSB (eCSB), to minimize the throughput gap between SMVP and SMTVP computations on GPUs, while at the same time enable a high computing throughput. We also use a hybrid storage format to store elements in each block, which can be selected dynamically at runtime. Experimental results show that the proposed techniques implemented on a Kepler GPU delivers similar throughput on both SMVP and SMTVP and the throughput is up to 13 times faster than that of the CPU-based CSB implementation. In addition, our eCSB procedure outperforms the previous GPU results by up to 188% and 914% in computing SMVP and SMTVP, and we validate the effectiveness of eCSB by means of wall-clock time of bi-conjugate gradient algorithm; our eCSB is 25% faster than Compressed Sparse Rows (CSR) and 6% faster than HYB, respectively. Copyright © 2014 John Wiley & Sons, Ltd.
منابع مشابه
Sparse Matrix-vector Multiplication on Nvidia Gpu
In this paper, we present our work on developing a new matrix format and a new sparse matrix-vector multiplication algorithm. The matrix format is HEC, which is a hybrid format. This matrix format is efficient for sparse matrix-vector multiplication and is friendly to preconditioner. Numerical experiments show that our sparse matrix-vector multiplication algorithm is efficient on
متن کاملTechniques for Parallel Manipulation of Sparse Matrices
New techniques are presented forthe manipulation of sparse matrices on parallel MIMD computers. We consider the following problems: matrix addition, matrix multiplication, row and column permutation, matrix transpose, matrix vector multiplication, and Gaussian elimination.
متن کاملTransposition Mechanism for Sparse Matrices on Vector Processors
Many scientific applications involve operations on sparse matrices. However, due to irregularities induced by the sparsity patterns, many operations on sparse matrices execute inefficiently on traditional scalar and vector architectures. To tackle this problem a scheme has been proposed consisting of two parts: (a) An extension to a vector architecture to support sparse matrix-vector multiplica...
متن کاملFast Sparse Matrix-Vector Multiplication on GPUs: Implications for Graph Mining
Scaling up the sparse matrix-vector multiplication kernel on modern Graphics Processing Units (GPU) has been at the heart of numerous studies in both academia and industry. In this article we present a novel non-parametric, selftunable, approach to data representation for computing this kernel, particularly targeting sparse matrices representing power-law graphs. Using real web graph data, we s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 27 شماره
صفحات -
تاریخ انتشار 2015