Topological Message Aggregation Techniques for Large - Scale Parallel Systems
نویسنده
چکیده
High overhead of fine-grained communication is a significant performance bottleneck for many classes of applications written for large scale parallel systems. This thesis explores techniques for reducing this overhead through topological aggregation, in which fine-grained messages are dynamically combined not just at each source of communication, but also at intermediate points between source and destination processors. The performance characteristics of aggregation and selection of virtual topology for routing are analyzed theoretically and experimentally. Schemes that exploit fast intra-node communication to reduce the number of inter-node messages are also explored. The presented techniques are implemented for the Charm++ parallel programming system in the Topological Routing and Aggregation Module (TRAM). It is also demonstrated how TRAM can be automated at the level of the runtime system to function with little or no configuration or input required from the user. Using TRAM, the performance of a number of benchmark and scientific applications is evaluated, with speedups of up to 10x for communication benchmarks and up to 4x for full-fledged scientific applications on petascale systems.
منابع مشابه
Comparison of Message Aggregation Strategies for Parallel Simulations on a High Performance Cluster
Parallel simulations of fine grain applications usually generate a large amount of messages. The overhead for sending these messages over an interconnection network can dramatically limit the speedup of a parallel simulation. In this case, message aggregation techniques can increase the granularity of the application and reduce the communication overhead. This paper compares sender-initiated an...
متن کاملOptimizing Message Aggregation for Parallel Simulation on High Performance Clusters
High performance clusters (HPCs) based on commodity hardware are becoming more and more popular in the parallel computing community. These new platforms offer a hardware capable of a very low latency and a very high throughput at an unbeatable cost, making them attractive for a large variety of parallel and distributed applications. With adequate communication software, HPCs have the potential ...
متن کاملSoftware Topological Message Routing and Aggregation Techniques for Large Scale Parallel Systems
Supercomputing networks are designed to minimize message latency. The focus on low latency implies a best effort attempt to deliver each message injected onto the network as soon as possible. Ostensibly, prioritizing message latency is important, and there are numerous examples of applications that benefit from it, but while this latency-centric view of a network seems logical, it often leads t...
متن کاملA comprehensive experimental comparison of the aggregation techniques for face recognition
In face recognition, one of the most important problems to tackle is a large amount of data and the redundancy of information contained in facial images. There are numerous approaches attempting to reduce this redundancy. One of them is information aggregation based on the results of classifiers built on selected facial areas being the most salient regions from the point of view of classificati...
متن کاملA Message-Passing Distributed Memory Parallel Algorithm for a Dual-Code Thin Layer, Parabolized Navier-Stokes Solver
In this study, the results of parallelization of a 3-D dual code (Thin Layer, Parabolized Navier-Stokes solver) for solving supersonic turbulent flow around body and wing-body combinations are presented. As a serial code, TLNS solver is very time consuming and takes a large part of memory due to the iterative and lengthy computations. Also for complicated geometries, an exceeding number of grid...
متن کامل