Computation-Communication Overlap on Network-of-Workstation Multiprocessors
نویسندگان
چکیده
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) sharedmemory multiprocessors. The transformation overlaps the communication time resulting form non-local memory accesses with the computation time in parallel loops to effectively hide the latency of the remote accesses. The transformation peels from a parallel loop iterations that access remote data and re-schedules them after the execution of iterations that access only local data (localonly iterations). Asynchronous prefetching of remote data is used to overlap non-local access latency with the execution of local-only iterations. Experimental evaluation of the transformation on a NOW multiprocessor indicates that it is generally effective in improving parallel execution time (up to 1.9 times). The extent of the benefit is determined by three factors: the size of localonly computations, the significance of remote memory access latency, and the position of the iterations that access remote data in a parallel loop.
منابع مشابه
Overlap of Computation and Communication on Shared-Memory
This paper describes and evaluates a compiler transformation that improves the performance of parallel programs on Network-of-Workstation (NOW) shared-memory multiprocessors. The transformation overlaps the communication time resulting form nonlocal memory accesses with the computation time in parallel loops to e ectively hide the latency of the remote accesses. The transformation peels from a ...
متن کاملHyFi: Architecture-Independent Parallelism on Networks of Multiprocessors
A network of parallel workstations promises cost-effective parallel computing. This paper presents the HyFi (Hybrid Filaments) package, which can be used to create architectureindependent parallel programs—that is, programs that are portable and efficient across different parallel machines. HyFi integrates Shared Filaments (SF), which provides parallelism on sharedmemory multiprocessors, and Di...
متن کاملNon - Uniform Partitioning of Finite Di erence Methods Running on SMP Clusters
A multicomputer or workstation cluster with multiprocessor nodes introduces signiicant need and opportunity for overlapping communication with computation. We evaluate partitioning strategies for an important application class, nite diierence methods, running on clusters of symmetric multiprocessors. Our results show that even for a regular, uniform nite diierence method, a non-uniform partitio...
متن کاملCompression-Based Ray Casting of Very Large Volume Data in Distributed Environments
This paper proposes a new parallel/distributed raycasting scheme for very large volume data that can be effectively used in distributed environments. Our method, based on data compression, attempts to enhance the rendering speedups by quickly reconstructing voxel data from local memory rather than expensively fetching them from remote memory spaces. Our compression-based volume rendering scheme...
متن کاملDemand-based coscheduling of parallel jobs on multiprogrammed multiprocessors
This thesis describes demand-based coscheduling, a new approach to scheduling parallel computations on multiprogrammed multiprocessors. In demand-based coscheduling, rather than making the pessimistic assumption that all the processes constituting a parallel job must be simultaneously scheduled in order to achieve good performance, information about which processes are communicating is used in ...
متن کامل