Reducing Synchronization Overheads in CG-type Parallel Iterative Solvers by Embedding Point-to-point Communications into Reduction Operations

نویسندگان

  • R. Oguz Selvitopi
  • Cevdet Aykanat
چکیده

Parallel iterative solvers are widely used in solving large sparse linear systems of equations on large-scale parallel architectures. These solvers generally contain two different types of communication operations: point-topoint (P2P) and global collective communications. In this work, we present a computational reorganization method to exploit a property that is commonly found in Krylov subspace methods. This reorganization allows P2P and collective communications to be performed simultaneously. We realize this opportunity to embed the content of the messages of P2P communications into the messages exchanged in the collective communications in order to reduce the latency overhead of the solver. Experiments on two different supercomputers up to 2048 processors show that the proposed latency-avoiding method exhibits superior scalability, especially with increasing number of processors.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ASYNC Loop Constructs for Relaxed Synchronization

Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors. When implemented in OpenMP, they repeatedly execute barrier synchronization in each iterative step to ensure that data dependencies are strictly satisfied. We propose new parallel annotations to support an asynchronous computation model for iterative s...

متن کامل

Parallel, multigrain iterative solvers for hiding network latencies on MPPs and networks of clusters

Parallel iterative solvers are often the only means of solving large linear systems and eigenproblems. However, these solvers are usually implemented in a fine grain manner and, when scaled to large numbers of processors on MPP’s, can incur significant performance penalties due to synchronization overheads. This problem is exacerbated in clusters of workstations (COWs) and SMPs that are interco...

متن کامل

Achieving Portable High Performance for Iterative Solvers on Accelerators

Many supercomputers, clusters, and workstations today are equipped with accelerators such as graphics processing units (GPUs) and Intel’s many-integrated core architecture (MIC). While their highly parallel architectures are very efficient for dense linear algebra operations, particularly those which are compute-bound rather than limited by memory bandwidth, their use for iterative solvers such...

متن کامل

The Communication-Hiding Conjugate Gradient Method with Deep Pipelines

Krylov subspace methods are among the most efficient present-day solvers for large scale linear algebra problems. Nevertheless, classic Krylov subspace method algorithms do not scale well on massively parallel hardware due to the synchronization bottlenecks induced by the computation of dot products throughout the algorithms. Communication-hiding pipelined Krylov subspace methods offer increase...

متن کامل

Convergence of an Iterative Scheme for Multifunctions on Fuzzy Metric Spaces

Recently, Reich and Zaslavski have studied a new inexact iterative scheme for fixed points of contractive and nonexpansive multifunctions. In 2011, Aleomraninejad, et. al. generalized some of their results to Suzuki-type multifunctions.  The study of iterative schemes for various classes of contractive and nonexpansive mappings is a central topic in fixed point theory. The importance of Banach ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014