Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic

نویسندگان

Jan Verschelde

Genady Yoffe

چکیده

Our problem is to accurately solve linear systems on a general purpose graphics processing unit with double double and quad double arithmetic. The linear systems originate from the application of Newton’s method on polynomial systems. Newton’s method is applied as a corrector in a path following method, so the linear systems are solved in sequence and not simultaneously. One solution path may require the solution of thousands of linear systems. In previous work we reported good speedups with our implementation to evaluate and differentiate polynomial systems on the NVIDIA Tesla C2050. Although the cost of evaluation and differentiation often dominates the cost of linear system solving in Newton’s method, because of the limited bandwidth of the communication between CPU and GPU, we cannot afford to send the linear system to the CPU for solving during path tracking. Because of large degrees, the Jacobian matrix may contain extreme values, requiring extended precision, leading to a significant overhead. This overhead of multiprecision arithmetic is our main motivation to develop a massively parallel algorithm. To allow overdetermined linear systems we solve linear systems in the least squares sense, computing the QR decomposition of the matrix by the modified Gram-Schmidt algorithm. We describe our implementation of the modified Gram-Schmidt orthogonalization method for the NVIDIA Tesla C2050, using double double and quad double arithmetic. Our experimental results show that the achieved speedups are sufficiently high to compensate for the overhead of one extra level of precision.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators

We present a Cholesky factorization for multicore with GPU accelerators. The challenges in developing scalable high performance algorithms for these emerging systems stem from their heterogeneity, massive parallelism, and the huge gap between the GPUs’ compute power vs the CPU-GPU communication speed. We show an approach that is largely based on software infrastructures that have already been d...

متن کامل

Algorithms for Quad-Double Precision Floating Point Arithmetic

A quad-double number is an unevaluated sum of four IEEE double precision numbers, capable of representing at least 212 bits of signi cand. We present the algorithms for various arithmetic operations (including the four basic operations and various algebraic and transcendental operations) on quad-double numbers. The performance of the algorithms, implemented in C++, is also presented.

متن کامل

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enables to compute all solutions of a polynomial system. We describe our massive parallel predictor-corrector algorithms to track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from the evaluation and differentiation of...

متن کامل

Large Scale Terrain Real-Time Rendering on GPU Using Double Layers Tile Quad Tree and Cuboids Bounding Error Metric

Improving terrain tile data selection efficiency, real-time loading of visible tile data and building GPU-based continuous Level of Details (LOD) are the key technologies for large scale terrain rendering on GPU. In this article, in order to reduce terrain tile data selection time, we build double layers tile quad tree for massive terrain data and organize tile data by designing Z-order space f...

متن کامل

Fast Poisson Solvers for Graphics Processing Units

Two block cyclic reduction linear system solvers are considered and implemented using the OpenCL framework. The topics of interest include a simplified scalar cyclic reduction tridiagonal system solver and the impact of increasing the radix-number of the algorithm. Both implementations are tested for the Poisson problem in two and three dimensions, using a Nvidia GTX 580 series GPU and double p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1210.0800 شماره

صفحات -

تاریخ انتشار 2012

Orthogononalization on a general purpose graphics processing unit with double double and quad double arithmetic

نویسندگان

چکیده

منابع مشابه

Hybrid Multicore Cholesky Factorization with Multiple GPU Accelerators

Algorithms for Quad-Double Precision Floating Point Arithmetic

Tracking Many Solution Paths of a Polynomial Homotopy on a Graphics Processing Unit

Large Scale Terrain Real-Time Rendering on GPU Using Double Layers Tile Quad Tree and Cuboids Bounding Error Metric

Fast Poisson Solvers for Graphics Processing Units

عنوان ژورنال:

اشتراک گذاری