Matrix Factorization Using Distributed Panels on the Fujitsu Ap1000

نویسنده

  • Peter Strazdins
چکیده

Dense linear algebra computations such as matrix factorization require the technique of`block-partitioned algorithms' for their eecient implementation on memory-hierarchy processors. For scalar-based distributed memory multiprocessors, the register, cache and oo-processor memory levels of the memory hierarchy all aaect the optimal block-partition size for such algorithms. Most studies on matrix factorization and similar algorithms have assumed that the block-partition size or panel width for the algorithm, !, to be the same as the matrix distribution block size, r, where a rectangular block-cyclic matrix distribution is being employed. Here the choice of ! = r is essentially determined by the oo-processor memory level of the memory hierarchy, with the value of ! being a tradeoo between communication startup overhead and load balance considerations. In this paper, we reexamine this assumption in the context of LU and Cholesky factorization of block-cyclic distributed matrices on scalar-based distributed memory multiprocessors, such as the Fujitsu AP1000. Here considerations of the register and cache levels of the hierarchy require a large !. We nd that the choice of !, given ! = r, leads to a tradeoo between load balance and optimal use of register and cache levels of the hierarchy (rather than communication startup), and that this tradeoo substantially limits performance. We then brieey describèdistributed pan-els' versions of these algorithms, where generally ! > r, which eeectively diminishes this tradeoo to an O(!=N) fraction of the overall computation , where N is the matrix size. Two variants of these versions, one with single rows/columns being communicated, and one with single block rows/columns being communicated, are analyzed for their load balance properties. The results of the distributed panels versions of the algorithms on the scalar-based distributed memory multiprocessor the Fujitsu AP1000 are given, which give signiicantly superior performance for distributed panels versions over the ! = r versions, with optimum performance achieved for r 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matrix Factorization using Distributed Panels on the Fujitsu APlOOO

Department of Computer Science, Australian National University, Acton, ACT 0200. AUSTRALIA E-mail: [email protected] The results of the distributed panels versions of the aloorithms on the scalar-based distributed memo y multiprocessor the Fujitsu A P1000 are given, which give significantly superior performance for distributed panels versions over the w = r versions, with optimum perfo...

متن کامل

A High Performance, Portable Distributed BLAS Implementation

In this paper, we give a report on recent developments for the Distributed BLAS (DBLAS) project. These include a powerful distributed matrix representation which yields a simple interface to the DBLAS, and the redesign the DBLAS algorithms terms of powerfuìspread' and`reduce' matrix communication operations for reasons of programmability. The DBLAS codes achieve portability by supporting BLACS ...

متن کامل

Linear Algebra Research on the AP

This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in 1993. These include the general implementation of Distributed BLAS Level 3 subroutines (for the scattered storage scheme). The performance and user interface issues of the implementation are discussed. Implementations of Distributed BLAS-based LU Decomposition, Cholesky Factorization and Star Pro...

متن کامل

Integer Factorisation on the AP1000∗

We compare implementations of two integer factorisation algorithms, the elliptic curve method (ECM) and a variant of the Pollard “rho” method, on three machines (the Fujitsu AP1000, VP2200 and VPP500) with parallel and/or vector architectures. ECM is scalable and well suited for both vector and parallel architectures.

متن کامل

Implementing Ml on the Fujitsu Ap1000

The CAP ML project seeks to develop a version of ML that is suitable for use on a distributed memory multiprocessor architecture such as the Fujitsu AP1000. Language extensions are proposed that have been developed in conjunction with a programmming methodology that is appropriate to that of a massively parallel computer whilst retaining a functional style. The implementation, which is based on...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995