Implementing Strassen's Algorithm with BLIS

نویسندگان

  • Jianyu Huang
  • Tyler M. Smith
  • Greg M. Henry
  • Robert A. van de Geijn
چکیده

We dispel with “street wisdom” regarding the practical implementation of Strassen’s algorithm for matrix-matrix multiplication (DGEMM). Conventional wisdom: it is only practical for very large matrices. Our implementation is practical for small matrices. Conventional wisdom: the matrices being multiplied should be relatively square. Our implementation is practical for rank-k updates, where k is relatively small (a shape of importance for libraries like LAPACK). Conventional wisdom: it inherently requires substantial workspace. Our implementation requires no workspace beyond buffers already incorporated into conventional high-performance DGEMM implementations. Conventional wisdom: a Strassen DGEMM interface must pass in workspace. Our implementation requires no such workspace and can be plug-compatible with the standard DGEMM interface. Conventional wisdom: it is hard to demonstrate speedup on multi-core architectures. Our implementation demonstrates speedup over conventional DGEMM even on an Intel R © Xeon Phi coprocessor utilizing 240 threads. We show how a distributed memory matrix-matrix multiplication also benefits from these advances.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Opportunities for Parallelism in Matrix Multiplication

BLIS is a new framework for rapid instantiation of the BLAS. We describe how BLIS extends the “GotoBLAS approach” to implementing matrix multiplication (gemm). While gemm was previously implemented as three loops around an inner kernel, BLIS exposes two additional loops within that inner kernel, casting the computation in terms of the BLIS micro-kernel so that porting gemm becomes a matter of c...

متن کامل

Architecture-eecient Strassen's Matrix Multiplication: a Case Study of Divide-and-conquer Algorithms Architecture-eecient Strassen's Matrix Multiplication: a Case Study of Divide-and-conquer Algorithms

Many fast algorithms in arithmetic complexity have hierarchical or recursive structures that make eecient implementations on high performance computers with memory hierarchies non-trivial. In this paper we present our ndings on eecient implementation of Strassen's algorithmm17] for the ubiquitous operation of matrix multiplication as a model for a class of recursive algorithms. In comparison to...

متن کامل

Strassen's 2x2 matrix multiplication algorithm: A conceptual perspective

Despite its importance, all proofs of the correctness of Strassen's famous 1969 algorithm to multiply two 2x2 matrices with only seven multiplications involve some more or less tedious calculations such as explicitly multiplying specific 2x2 matrices, expanding expressions to cancel terms with opposing signs, or expanding tensors over the standard basis. This is why the proof is nontrivial to m...

متن کامل

Naïve Matrix Multiplication versus Strassen Algorithm in Multi-thread Environment

Naivno množenje matrica In the first section, we will give mathematical reasoning of Strassen's alghoritm for matrix multiplication. First a naïve method for matrix multiplication is explained, and then it is extended to more advanced Strassen's method. In the following section a description of programming language and framework is given, with explanation of the algorithm's implementation. Last...

متن کامل

A BSP Realisation of Strassen's Algorithm

An eecient BSP realisation of Strassen's matrix multiplication algorithm is described. 1 Strassen's Algorithm Let A and B be two n n matrices and consider the problem of computing C = A B. We can regard the matrices A; B; C as each composed of four n=2 n=2 submatrices. For example, ! If the submatrices of B and C are described in the same way then we have C ij = A i0 B 0j + A i1 B 1j for all i;...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1605.01078  شماره 

صفحات  -

تاریخ انتشار 2016