High-Performance Tensor Contraction without Transposition
نویسنده
چکیده
Tensor computations–in particular tensor contraction (TC)–are important kernels in many scientific computing applications (SCAs). Due to the fundamental similarity of TC to matrix multiplication (MM) and to the availability of optimized implementations such as the BLAS, tensor operations have traditionally been implemented in terms of BLAS operations, incurring both a performance and a storage overhead. Instead, we implement TC using the much more flexible BLIS framework, which allows for reshaping of the tensor to be fused with internal partitioning and packing operations, requiring no explicit reshaping operations or additional workspace. This implementation achieves performance approaching that of MM, and in some cases considerably higher than that of traditional TC. Our implementation also supports multithreading using an approach identical to that used for MM in BLIS, with similar performance characteristics. The complexity of managing tensorto-matrix transformations is also handled automatically in our approach, greatly simplifying its use in SCAs.
منابع مشابه
Design of a high-performance GEMM-like Tensor-Tensor Multiplication
We present " GEMM-like Tensor-Tensor multiplication " (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the identification of three index sets, involved in the tensor contraction, which enable us to systematically reduce an arbitrary tensor contraction to loops around a h...
متن کاملA massively parallel tensor contraction framework for coupled-cluster computations
Precise calculation of molecular electronic wavefunctions by methods such as coupled-cluster requires the computation of tensor contractions, the cost of which has polynomial computational scaling with respect to the system and basis set sizes. Each contraction may be executed via matrix multiplication on a properly ordered and structured tensor. However, data transpositions are often needed to...
متن کاملA High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry: The Tensor Contraction Engine∗
This paper provides an overview of a program synthesis system for a class of quantum chemistry computations. These computations are expressible as a set of tensor contractions and arise in electronic structure modeling. The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics o...
متن کاملAudiometric findings with voluntary tensor tympani contraction
BACKGROUND Tensor tympani contraction may have a "signature" audiogram. This study demonstrates audiometric findings during voluntary tensor tympani contraction. METHODS Five volunteers possessing the ability to voluntarily contract their tensor tympani muscles were identified and enrolled. Tensor tympani contraction was confirmed with characteristic tympanometry findings. Study subjects unde...
متن کاملPerformance optimization of tensor contraction expressions for many-body methods in quantum chemistry.
Complex tensor contraction expressions arise in accurate electronic structure models in quantum chemistry, such as the coupled cluster method. This paper addresses two complementary aspects of performance optimization of such tensor contraction expressions. Transformations using algebraic properties of commutativity and associativity can be used to significantly decrease the number of arithmeti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Scientific Computing
دوره 40 شماره
صفحات -
تاریخ انتشار 2018