On Configuring Distributed Memory Process Grids for Tensor Contraction Applications
نویسندگان
چکیده
Tensor contractions are important computations for many domains including quantum chemistry and big-data applications. Distributed-memory architectures, architectures where numerous processing elements have private memory, are used in order to solve large problems efficiently. On such systems, achieving high performance computation is dependent on the arrangement of the processing elements. The optimal configuration is typically a nontrivial function of the network topology and application implementation. In this paper, we investigate this relationship between network topology and the communication pattern of the application and develop a framework towards automatically determining the near-optimal arrangement of processing elements for a given application on a distributed-memory architecture.
منابع مشابه
Strassen's Algorithm for Tensor Contraction
Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multi-dimensional generalization of matrix multiplication (GEMM). While Strassen’s algorithm for GEMM is well studied in theory and practice, extending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice sp...
متن کاملA massively parallel tensor contraction framework for coupled-cluster computations
Precise calculation of molecular electronic wavefunctions by methods such as coupled-cluster requires the computation of tensor contractions, the cost of which has polynomial computational scaling with respect to the system and basis set sizes. Each contraction may be executed via matrix multiplication on a properly ordered and structured tensor. However, data transpositions are often needed to...
متن کاملImage Registration Using Tensor Grids for Lung Ventilation Studies
In non-parametric image registration it is often not possible to work with the original resolution of the images due to high processing times and lack of memory. However, for some medical applications the information contained in the original resolution is crucial in certain regions of the image while being negligible in others. To adapt to this problem we will present an approach using tensor ...
متن کاملA Methodology for Generating Efficient Disk-Based Algorithms from Tensor Product Formulas
In this paper, we address the issue of automatic generation of disk-based algorithms from tensor product formulas. Disk-based algorithms are required in scientiic applications which work with large data sets that do not t entirely into main memory. Tensor products have been used for designing and implementing block recursive algorithms on shared-memory, vector and distributed-memory multiproces...
متن کاملA Debugger for Computational Grid Applications
The Portable Parallel/Distributed Debugger project at the NASA Ames Research Center has built a debugger for applications running on heterogeneous computational grids. It employs a client-server architecture to simplify the implementation, and its user interface has been designed to provide process control and state examination functions on computations with a large number of processes. The deb...
متن کامل