Vector Prefix and Reduction Computation on Coarse-Grained, Distributed-Memory Parallel Machines
نویسندگان
چکیده
Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split algorithm, for vector prefix and reduction computation on coarse-grained, distributed-memory parallel machines. Our algorithms are relatively architecture independent and can be used effectively in many applications such as Pack/Unpack, Array Prefix/Reduction Functions, and Array Combining Scatter Functions, which are defined in Fortran 90 and in High Performance Fortran. Experimental results on the CM-5 are presented.
منابع مشابه
A High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملA High Performance Parallel IP Lookup Technique Using Distributed Memory Organization and ISCB-Tree Data Structure
The IP Lookup Process is a key bottleneck in routing due to the increase in routing table size, increasing traıc and migration to IPv6 addresses. The IP address lookup involves computation of the Longest Prefix Matching (LPM), which existing solutions such as BSD Radix Tries, scale poorly when traıc in the router increases or when employed for IPv6 address lookups. In this paper, we describe a ...
متن کاملPACK/UNPACK on Coarse-Grained Distributed Memory Parallel Machines
PACK/UNPACK are Fortran 90/HPF array construction functions which derive new arrays from existing arrays. We present algorithms for performing these operations on coarse-grained parallel machines. Our algorithms are relatively architecture independent and can be applied to arrays of arbitrary dimensions with arbitrary distributionalong every dimension. Experimental results are presented on the
متن کاملCommunication-Efficient Deterministic Parallel Algorithms for Planar Point Location and 2d Voronoi Diagram
In this paper we describe deterministic parallel algorithms for planar point location and for building the Voronoi Diagram of n co-planar points. These algorithms are designed for BSP-like models of computation, where p processors, with O(~) ~> O(1) local memory each, communicate through some arbitrary interconnection network. They axe communication-efficient since they require, respectively, O...
متن کاملContribution à la parallélisation de méthodes numériques à matrices creuses skyline. Application à un module de calcul de modes et fréquences propres de Systus
Distributed memory machines consisting of multiple autonomous processors connected bya network are becoming commonplace. Unlike specialized machines like systolic arrays, such systems ofautonomous processors provide virtual parallelism through standard message passing libraries {PVMor MPI). In the area of parallelizing existing numerical algorithms, two main approaches have been...
متن کامل