Achieving Numerical Accuracy and High Performance using Recursive Tile LU Factorization
نویسندگان
چکیده
The LU factorization is an important numerical algorithm for solving systems of linear equations in science and engineering, and is characteristic of many dense linear algebra computations. It has even become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected bt the TOP500 website. In this context, the challenge in developing new algorithms for the scientific community resides in the combination of two goals: achieving high performance and maintaining the accuracy of the numerical algorithm. This paper proposes a novel approach for computing the LU factorization in parallel on multicore architectures, which not only improves the overall performance, but also sustains the numerical quality of the standard LU factorization algorithm with partial pivoting. While the update of the trailing submatrix is computationally intensive and highly parallel, the inherently problematic portion of the LU factorization is the panel factorization due to its memory-bound characteristic as well as the atomicity of selecting the appropriate pivots. Our approach uses a parallel fine-grained recursive formulation of the panel factorization step and implements the update of the trailing submatrix with the tile algorithm. Based on conflict-free partitioning of the data and lockless synchronization mechanisms, our implementation lets the overall computation flow naturally without contention. The dynamic runtime system called QUARK is then able to schedule tasks with heterogeneous granularities and to transparently introduce algorithmic lookahead. The performance results of our implementation are competitive compared to the currently available software packages and libraries. In particular, it is up to 40% faster when compared to the equivalent Intel MKL routine and up to 3-fold faster than LAPACK with multithreaded Intel MKL BLAS.
منابع مشابه
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting
The LU factorization is an important numerical algorithm for solving systems of linear equations in science and engineering and is a characteristic of many dense linear algebra computations. For example, it has become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected by the TOP500 website. Multicore process...
متن کاملExploiting Fine-Grain Parallelism in Recursive LU Factorization
The LU factorization is an important numerical algorithm for solving system of linear equations in science and engineering and is characteristic of many dense linear algebra computations. It has even become the de facto numerical algorithm implemented within the LINPACK benchmark to rank the most powerful supercomputers in the world, collected in the TOP500 website. In this context, the challen...
متن کاملParallel Computation of Echelon Forms
We propose efficient parallel algorithms and implementations on shared memory architectures of LU factorization over a finite field. Compared to the corresponding numerical routines, we have identified three main difficulties specific to linear algebra over finite fields. First, the arithmetic complexity could be dominated by modular reductions. Therefore, it is mandatory to delay as much as po...
متن کاملTiled Algorithms for Matrix Computations on Multicore Architectures
Current computer architecture has moved towards the multi/many-core structure. However, the algorithms in the current sequential dense numerical linear algebra libraries (e.g. LAPACK) do not parallelize well on multi/many-core architectures. A new family of algorithms, the tile algorithms, has recently been introduced to circumvent this problem. Previous research has shown that it is possible t...
متن کاملTowards a Parallel Tile LDL Factorization for Multicore Architectures
The increasing number of cores in modern architectures requires the development of new algorithms as a means to achieving concurrency and hence scalability. This paper presents an algorithm to compute the LDLT factorization of symmetric indefinite matrices without taking pivoting into consideration. The algorithm, based on the factorizations presented by Buttari et al. [11], represents operatio...
متن کامل