Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines

نویسندگان

Seungjo Bae

Dongmin Kim

Sanjay Ranka

چکیده

Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. Our algorithms are relatively architecture independent and can be used effectively in many applications such as Pack/Unpack, Array Prefix/Reduction Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...

متن کامل

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

متن کامل

PACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines

PACK/UNPACK are Fortran 90/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distributionalong every dimension. Experimental results are presented on the

متن کامل

Communication-Efficient Deterministic Parallel Algorithms for Planar Point Location and 2d Voronoi Diagram

In this paper we describe deterministic parallel algorithms for planar point location and for building the Voronoi Diagram of n co-planar points. These algorithms are designed for BSP-like models of computation, where p processors, with O(~) ~> O(1) local memory each, communicate through some arbitrary interconnection network. They axe communication-efficient since they require, respectively, O...

متن کامل

Contribution à la parallélisation de méthodes numériques à matrices creuses skyline. Application à un module de calcul de modes et fréquences propres de Systus

Distributed memory machines consisting of multiple autonomous processors connected bya network are becoming commonplace. Unlike specialized machines like systolic arrays, such systems ofautonomous processors provide virtual parallelism through standard message passing libraries {PVMor MPI). In the area of parallelizing existing numerical algorithms, two main approaches have been...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines

نویسندگان

چکیده

منابع مشابه

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure

PACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines

Communication-Efficient Deterministic Parallel Algorithms for Planar Point Location and 2d Voronoi Diagram

Contribution à la parallélisation de méthodes numériques à matrices creuses skyline. Application à un module de calcul de modes et fréquences propres de Systus

عنوان ژورنال:

اشتراک گذاری