An Approach to Automatic Tuning for Parallel Householder QR Decomposition
نویسندگان
چکیده
We consider parallel computing of the Householder QR decomposition on SMP machines. This decomposition is one of the basic tools in matrix computations and is used in various problems such as the least square problem and the singular value decomposition of a rectangular matrix. Since this algorithm consists almost entirely of BLAS routines such as matrix-vector multiplications, the simplest way of parallelization is to parallelize each BLAS routine. Moreover, using the blocking technique [1], we can use matrix-matrix multiplications, which can be efficiently parallelized. On the other hand, the TSQR algorithm has been proposed in 2007 [2]. Thanks to this algorithm, coarse-grain parallelization of the QR decomposition for a Tall Skinny matrix has become possible. Additionally, using the blocking technique, we can apply this algorithm to the QR decomposition of not necessarily tall skinny matrices [3]. For efficient parallel computing, we need to consider two points at the same time; how to combine the BLAS-level parallelism and the TSQR algorithm, and how to partition a matrix into blocks. In our poster, we aim for automatic determination of these two things depending on the target machine and the size of the target matrix. In our approach, we first identify parameters to optimize. Next we define an objective function based on the hierarchical structure of the computation. We show that the optimization problem can be transformed into the Bellman equation, which can be solved using dynamic programming [4]. Finally we propose a practical solution based on the performance model. Performance evaluation on Xeon processor with 8 cores shows that performance of computation tuned with our approach is about as high as that tuned by hand.
منابع مشابه
Vector and Parallel Tuning of Solid Earth Simulation Codes - GeoFEM and Householder QR Decomposition -
In this paper, we discuss vector and parallel tuning of GeoFEM and the Householder QR decomposition process being solid earth simulation codes. Process of GeoFEM code can be roughly divided into two parts, Matrix Assemble part and Solver part. Currently, GeoFEM's parallel iterative solver has attained good parallel and vector performance, and GeoFEM's Matrix Assemble part has attained good para...
متن کاملImplementing Communication-optimal Parallel and Sequential Qr Factorizations
We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example up t...
متن کاملRecursive least-squares using a hybrid Householder algorithm on massively parallel SIMD systems
Within the context of recursive least-squares, the implementation of a Householder algorithm for block updating the QR decomposition, on massively parallel SIMD systems, is considered. Initially, two implementations based on dierent mapping strategies for distributing the data matrices over the processing elements of the parallel computer are investigated. Timing models show that neither of th...
متن کاملCommunication-optimal Parallel and Sequential QR and LU Factorizations
We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform and just as stable as Householder QR. We prove optimality by deriving new lower bounds for the number of multiplications done by “non-Strassen-like” QR, and using these in known communication lower bounds that are proportional to ...
متن کاملMatrix Decomposition
5 QR Decomposition 7 5.1 Householder Reflections and Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . 8 5.2 Gram-Schmidt orthonormalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.3 QR Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5.4 Least Square Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....
متن کامل