Cache-aware Performance Modeling and Prediction for Dense Linear Algebra

نویسندگان

  • Elmar Peise
  • Paolo Bientinesi
چکیده

Countless applications cast their computational core in terms of dense linear algebra operations. These operations can usually be implemented by combining the routines offered by standard linear algebra libraries such as BLAS and LAPACK, and typically each operation can be obtained in many alternative ways. Interestingly, identifying the fastest implementation—without executing it—is a challenging task even for experts. An equally challenging task is that of tuning each routine to performance-optimal configurations. Indeed, the problem is so difficult that even the default values provided by the libraries are often considerably suboptimal; as a solution, normally one has to resort to executing and timing the routines, driven by some form of parameter search. In this paper, we discuss a methodology to solve both problems: identifying the best performing algorithm within a family of alternatives, and tuning algorithmic parameters for maximum performance; in both cases, we do not execute the algorithms themselves. Instead, our methodology relies on timing and modeling the computational kernels underlying the algorithms, and on a technique for tracking the contents of the CPU cache. In general, our performance predictions allow us to tune dense linear algebra algorithms within few percents from the best attainable results, thus allowing computational scientists and code developers alike to efficiently optimize their linear algebra routines and codes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Modeling and Prediction for Dense Linear Algebra

This dissertation introduces measurement-based performance modeling and prediction techniques for dense linear algebra algorithms. As a core principle, these techniques avoid executions of such algorithms entirely, and instead predict their performance through runtime estimates for the underlying compute kernels. For a variety of operations, these predictions allow to quickly select the fastest...

متن کامل

A Study on the Influence of Caching: Sequences of Dense Linear Algebra Kernels

It is universally known that caching is critical to attain highperformance implementations: In many situations, data locality (in space and time) plays a bigger role than optimizing the (number of) arithmetic floating point operations. In this paper, we show evidence that at least for linear algebra algorithms, caching is also a crucial factor for accurate performance modeling and performance p...

متن کامل

Performance Modeling Tools for Parallel Sparse Linear Algebra Computations

We developed a Performance Modeling Tools (PMTOOLS) library to enable simulation-based performance modeling for parallel sparse linear algebra algorithms. The library includes micro-benchmarks for calibrating the system’s parameters, functions for collecting and retrieving performance data, and a cache simulator for modeling the detailed memory system activities. Using these tools, we have buil...

متن کامل

Power-aware Speed-up for Multithreaded Numerical Linear Algebraic Solvers on Chip Multicore Processors

With the advent of multicore chips new parallel computing metrics and models have become essential for redesigning traditional scientific application libraries tuned to a single chip. In this paper we evolve metrics specific to generalized chip multicore processors (CMP) and use them for parallel performance modeling of numerical linear algebra routines that are commonly available as shared obj...

متن کامل

Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch

In this paper, we introduce a concept called algorithmic prefetching, for exploiting some of the features of the IBM RISC System/6000@’ computer. Algorithmic prefetching denotes changing algorithm A to algorithm B, which contains additional steps to move data from slower levels of memory to faster levels, with the aim that algorithm B outperform algorithm A. The objective of algorithmic prefetc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1409.8602  شماره 

صفحات  -

تاریخ انتشار 2014