Tensor Multiplication on Parallel Computers
نویسنده
چکیده
One disadvantage of our approach is that it requires each processor to work on a large piece of the problem for a long time, thus increasing the probability that a single processor failure will sabotage the computation. Another limitation is that we must increase the number of processors in potentially large step-increments in order to take advantage of larger clusters. (The ideal number of processors is an integer multiple or divisor of the number of rows in u.)
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملExperimental Evaluation of BSP Programming Libraries
The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while keeping predictable performance on different parallel computers. Nevertheless, when programming in ‘BSP style’, the running time of the implementation of an algorithm can be very dependent on the underlying communications library. In this study, an overview of existing approache...
متن کاملParallel Implementation of Multiple-Precision Arithmetic and 1, 649, 267, 440, 000 Decimal Digits of π Calculation
We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel computers. A parallel implementation of floating-point real FFT-based multiplication is used because a key operation in fast multiple-precision arithmetic is multiplication. We also parallelized an operation of releasing propagated carr...
متن کاملGeneralized Hyper-Systolic Algorithm
We generalize the hyper-systolic algorithm proposed in [1] for abstract data structures on massive parallel computers with np processors. For a problem of size V the communication complexity of the hyper-systolic algorithm is proportional to √ npV , to be compared with npV for the systolic case. The implementation technique is explained in detail and the example of the parallel matrix-matrix mu...
متن کامل