Performance of PDE solvers on a self-optimizing NUMA architecture

نویسندگان

  • Sverker Holmgren
  • Markus Nordén
  • Jarmo Rantakokko
  • Dan Wallin
چکیده

The performance of shared-memory (OpenMP) implementations of three different PDE solver kernels representing finite difference methods, finite volume methods, and spectral methods has been investigated. The experiments have been performed on a self-optimizing NUMA system, the Sun Orange prototype, using different data placement and thread scheduling strategies. The results show that correct data placement is very important for the performance for all solvers. However, the Orange system has a unique capababilty of automatically changing the data distribution at run time through both migration and replication of data. For reasonable large PDE problems, we find that the time to do this is negligible compared to the total solve time. Also, the performance after the migration and replication process has reached steady-state is the same as what is achieved if data is optimally placed at the beginning of the execution using hand tuning. This shows that, for the application studied, the self-optimizing features are successful, and shared memory code without explicit data distribution directives yields good performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance of High-Accuracy PDE Solvers on a Self-Optimizing NUMA Architecture

High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communication. Performance analysis of a non-trivial kernel representing a PDE solution algorithm has been carried out on a Sun WildFire computer. Here, different architecture, system and programming models can be studied. Th...

متن کامل

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently. Using the simpli ed model, it is shown that PDE solvers using ordered ...

متن کامل

Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers

On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geographical locality, as the non-uniformity is a consequence of the physical distance between the cc-NUMA nodes. In this article, we compare the well established method of exploiting the rst-touch strategy using parallel in...

متن کامل

A NUMA Aware Scheduler for a Parallel Sparse Direct Solver

Over the past few years, parallel sparse direct solvers have made significant progress [1, 3, 4]. They are now able to solve efficiently real-life three-dimensional problems with several millions of equations. Since the last decade, most of the supercomputer architectures are based on clusters of SMP (Symmetric Multi-Processor) nodes. In [5], the authors proposed a hybrid MPI-thread implementat...

متن کامل

Analyzing Advanced PDE Solvers Through Simulation

By simulating a real computer it is possible to gain a detailed knowledge of the cache memory utilization of an application, e.g., a partial differential equation (PDE) solver. Using this knowledge, we can discover regions with intricate cache memory performance. Furthermore, this information makes it possible to identify performance bottlenecks. In this paper, we employ full system simulation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Algorithms Appl.

دوره 17  شماره 

صفحات  -

تاریخ انتشار 2002