نتایج جستجو برای: matrix multiplication
تعداد نتایج: 385488 فیلتر نتایج به سال:
This thesis discusses the matrix container template I implemented as part of the STXXL library for very large data sets. Because it is designed for matrices too big to be held in internal memory, algorithms and data structures are chosen to be efficient for external memory operation. Transposition, addition, and scalar multiplication are easy; therefore their description is kept brief. Matrix m...
In a large-scale and distributed matrix multiplication problem C = AB, where C ∈ Rr×t, the coded computation plays an important role to effectively deal with “stragglers” (distributed computations that may get delayed due to few slow or faulty processors). However, existing coded schemes could destroy the significant sparsity that exists in large-scale machine learning problems, and could resul...
Matrices of integers modulo a small prime can be compressed by storing several entries into a single machine word. Modular addition is performed by addition and possibly subtraction of a word containing several times the modulus. We show how modular multiplication can also be performed. In terms of arithmetic operations, the gain over classical matrix multiplication is equal to the number of in...
These notes started their life as a lecture given at the Toronto Student Seminar on February 9, 2012. The material is taken mostly from the classic paper by Coppersmith and Winograd [CW]. Other sources are §15.7 of Algebraic Complexity Theory [ACT], Stothers’s thesis [Sto], V. Williams’s recent paper [Wil], and the paper by Cohn at al. [CKSU]. Starred sections are the ones we didn’t have time t...
In modern clustering environments where the memory hierarchy has many layers (distributed memory, shared memory layer, cache, ), an important question is how to fully utilize all available resources and identify the most dominant layer in certain computation. When combining algorithms on all layers together, what would be the best method to get the best performance out of all the resources we h...
This document describes techniques for speeding up matrix multiplication on some high-performance computer architectures, including the IBM RS-6000, the IBM 3090/600S-VF, the MIPS RC3240 and RC6280, the Stardent 3040, and the Sun SPARCstation. The methods illustrate general principles that can be applied to the inner loops of scientific code.
A novel parallel algorithm for matrix multiplication is presented. The hyper-systolic algorithm makes use of a one-dimensional processor abstraction. The procedure can be implemented on all types of parallel systems. It can handle matrix-vector multiplications as well as transposed matrix products.
Strassen’s and Winograd’s algorithms for n × n matrix multiplication are investigated and compared with the normal algorithm. The normal algorithm requires n3 + O(n2) multiplications and about the same number of additions. Winograd’s algorithm almost halves the number of multiplications at the expense of more additions. Strassen’s algorithm reduces the total number of operations to O(n2.82) by ...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید