Eeectiveness of Message Strip-mining for Regular and Irregular Communication

نویسندگان

  • Akiyoshi Wakatani
  • Michael Wolfe
چکیده

Languages such as High Performance Fortran are used to implement parallel algorithms by distributing large data structures across a multicomputer system. To hide communication behind computation, we introduce an optimization scheme, message strip-mining. By using this scheme, the communication overhead is almost completely overlapped with the subsequent computation. We have implemented the proposed scheme for redistribution of arrays (regular communication) and executor for indirect access (irregular communication), and have achieved speedups of 3.5 and 2.6 for a redistribution of 2560 2560 array and an executor to collect data whose size is 5 10 5 for each processor, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An E cient Uniform Run - time Scheme for Mixed Regular - IrregularApplications

Almost all applications containing indirect array addressing (irregular accesses) have a substantial number of direct array accesses (regular accesses) too. A conspicuous percentage of these direct array accesses usually require inter-processor communication for the applications to run on a distributed memory multicomputer. This study highlights how lack of a uniform representation and lack of ...

متن کامل

Message Strip-Mining Heuristics for High Speed Networks

In this work we investigate how the compiler technique of message strip mining performs in practice on contemporary high performance networks. Message strip mining attempts to reduce the overall cost of communication in parallel programs by breaking up large message transfers into smaller ones that can be overlapped with computation. In practice, however, network resource constraints may negate...

متن کامل

A Data Reorganization Technique for Improving Data Locality of Irregular Applications in Software Distributed Shared Memory

Irregular applications are characterized by highly irregular and ne-grained data referencing patterns. When there is poor locality between the ne-grained data, serious false sharing can occur which has largely contributed to poor performance of irregular applications on page-based software distributed shared memory (DSM) systems. Partitioning data in irregular applications to improve data local...

متن کامل

Optimizing Partitioned Global Address Space Programs for Cluster Architectures

Optimizing Partitioned Global Address Space Programs for Cluster Architectures by Wei-Yu Chen Doctor of Philosophy in Computer Science University of California, Berkeley Professor Katherine A. Yelick, Chair Unified Parallel C (UPC) is an example of a partitioned global address space language for high performance parallel computing. This programming model enables application to be written in a s...

متن کامل

Parallelizing Irregular Applications through the YAPPA Compilation Framework

Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994