Multilevel communication optimal LU and QR factorizations for hierarchical platforms

نویسندگان

  • Laura Grigori
  • Mathias Jacquelin
  • Amal Khabou
چکیده

This study focuses on the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We first introduce a new model called Hierarchical Cluster Platform (HCP), encapsulating the characteristics of such platforms. The focus is set on reducing the communication requirements of studied algorithms at each level of the hierarchy. Lower bounds on communications are therefore extended with respect to the HCP model. We then introduce multilevel LU and QR algorithms tailored for those platforms, and provide a detailed performance analysis. We also provide a set of numerical experiments and performance predictions demonstrating the need for such algorithms on large platforms. Key-words: QR, LU, exascale, hierarchical platforms ∗ INRIA Paris Rocquencourt, B.P. 105, F-78153 Le Chesnay Cedex, France † UPMC Univ Paris 6, CNRS UMR 7598, Laboratoire Jacques-Louis Lions, F-75005, Paris, France Factorisations LU et QR multi-niveaux optimales en communication pour plates-formes hiérarchiques Résumé : Cette étude porte sur l’analyse des performances de deux algorithmes classiques de l’algèbre linéaire dense, les factorisations LU et QR, sur des platesformes multi-niveaux hiérarchiques. Nous présentons tout d’abord un nouveau modèle analytique appelé Hierarchical Cluster Platform (HCP), encapsulant les caractéristiques de ce type de plates-formes. Plus précisément, l’emphase est mise sur ce qui se passe à chaque niveau de la hiérarchie. Nous étendons des bornes inférieures sur les communications au modèle HCP. Nous introduisons ensuite deux algorithmes multi-niveaux adaptés à ces plates-formes pour les factorisations LU et QR, et analysons leurs performances. Nous présentons en outre un ensemble d’expériences numériques ainsi que des prédictions de performances illustrant la nécessité de tels algorithmes sur les plates-formes à grande échelle. Mots-clés : QR, LU, exascale, plates-formes hiérarchiques Multilevel communication optimal LU and QR factorizations 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Predictions of Multilevel Communication Optimal LU and QR Factorizations on Hierarchical Platforms

In this paper we study the performance of two classical dense linear algebra algorithms, the LU and the QR factorizations, on multilevel hierarchical platforms. We note that we focus on multilevel QR factorization, and give a brief description of the multilevel LU factorization. We first introduce a performance model called Hierarchical Cluster Platform (Hcp), encapsulating the characteristics ...

متن کامل

Communication-optimal Parallel and Sequential QR and LU Factorizations

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform and just as stable as Householder QR. We prove optimality by deriving new lower bounds for the number of multiplications done by “non-Strassen-like” QR, and using these in known communication lower bounds that are proportional to ...

متن کامل

Communication-optimal Parallel and Sequential Cholesky Decomposition

Numerical algorithms have two kinds of costs: arithmetic and communication, by which we mean either moving data between levels of a memory hierarchy (in the sequential case) or over a network connecting processors (in the parallel case). Communication costs often dominate arithmetic costs, so it is of interest to design algorithms minimizing communication. In this paper we first extend known lo...

متن کامل

Communication-optimal parallel and sequential QR and LU factorizations: theory and practice

We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m × n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m n. Our second algorithm, CAQR (Communication-Avoi...

متن کامل

Implementing Communication-optimal Parallel and Sequential Qr Factorizations

We present parallel and sequential dense QR factorization algorithms for tall and skinny matrices and general rectangular matrices that both minimize communication, and are as stable as Householder QR. The sequential and parallel algorithms for tall and skinny matrices lead to significant speedups in practice over some of the existing algorithms, including LAPACK and ScaLAPACK, for example up t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1303.5837  شماره 

صفحات  -

تاریخ انتشار 2013