Effective GPU Strategies for LU Decomposition

نویسندگان

  • H. M. D. M. Bandara
  • D. N. Ranasinghe
چکیده

GPUs are becoming an attractive computing platform not only for traditional graphics computation but also for general-purpose computation because of the computational power, programmability and comparatively low cost of modern GPUs. This has lead to a variety of complex GPGPU applications with significant performance improvements. The LU decomposition represents a fundamental step in many computationally intensive scientific applications and it is often the costly step in the solution process because of the impact of size of the matrix. In this paper we implement three different variants of the LU decomposition algorithm on a Tesla C1060 and the most significant LU decomposition that fits the highly parallel architecture of modern GPUs is found to be Update through Column with shared memory access implementation. Keywords—LU decomposition, CUDA, GPGPU

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Triangular Solvers on GPU

In this paper, we investigate GPU based parallel triangular solvers systematically. The parallel triangular solvers are fundamental to incomplete LU factorization family preconditioners and algebraic multigrid solvers. We develop a new matrix format suitable for GPU devices. Parallel lower triangular solvers and upper triangular solvers are developed for this new data structure. With these solv...

متن کامل

Randomized LU Decomposition Using Sparse Projections

A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFTbased random projections. The asymptotic approximation error of the algorithm is analyzed and a theoretical error bound is presented. Finally, numerical examples illustrate that for a similar approximation error, ...

متن کامل

Parallelization of the LU Decomposition on Heterogeneous Systems

With the appearance of GPUs as valid platforms, not only for graphics computation, but also general-purpose computations, applications that exploit hybrid/heterogeneous systems can be made available to the mass market due to the widespread availability of these systems. Correct distribution of the workload of these applications can lead way to significant performance boosts to complex applicati...

متن کامل

Automatically Tuned Dense Linear Algebra for Multicore+GPU

The Multicore+GPU architecture has been adopted in some of the fastest supercomputers listed on the TOP500. The MAGMA project aims to develop a dense linear algebra library similar to LAPACK but for heterogeneous/hybrid architectures processors like Multicore+GPU. However, to provide portable performance, manual parameter tuning is required. This paper presents automatically tuned LU factorizat...

متن کامل

Locality Optimization on a NUMA Architecture for Hybrid LU Factorization

We study the impact of non-uniform memory accesses (NUMA) on the solution of dense general linear systems using an LU factorization algorithm. In particular we illustrate how an appropriate placement of the threads and memory on a NUMA architecture can improve the performance of the panel factorization and consequently accelerate the global LU factorization. We apply these placement strategies ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011