Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming
نویسندگان
چکیده
Hybrid parallel programming with MPI for internode communication in conjunction with a shared-memory programming model to manage intranode parallelism has become a dominant approach to scalable parallel programming. While this model provides a great deal of flexibility and performance potential, it saddles programmers with the complexity of utilizing two parallel programming systems in the same application. We introduce an MPI-integrated shared-memory programming model that is incorporated into MPI through a small extension to the one-sided communication interface. We discuss the integration of this interface with the upcoming MPI 3.0 one-sided semantics and describe solutions for providing portable and efficient data sharing, atomic operations, and memory consistency. We describe an implementation of the new interface in the MPICH2 and Open MPI implementations and demonstrate an average performance improvement of 40% to the communication component of a five-point stencil solver.
منابع مشابه
Leveraging MPI-3 Shared-Memory Extensions for Efficient PGAS Runtime Systems
The relaxed semantics and rich functionality of one-sided communication primitives of MPI-3 makes MPI an attractive candidate for the implementation of PGAS models. However, the performance of such implementation suffers from the fact, that current MPI RMA implementations typically have a large overhead when source and target of a communication request share a common, local physical memory. In ...
متن کاملOptimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows
The Message Passing Interface (MPI) has been very popular for programming parallel scientific applications. As the multi-core architectures have become prevalent, a major question that has emerged about the use of MPI within a compute node and its impact on communication costs. The one-sided communication interface in MPI provides a mechanism to reduce communication costs by removing matching r...
متن کاملA Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP
The paper describes an efficient communication support for the global address space programming model on the IBM SP, a commercial example of the SMP (symmetric multi-processor) clusters. Our approach integrates shared memory with active messages, threads and remote memory copy between nodes. The shared memory operations offer substantial performance improvement over LAPI, IBM one-sided communic...
متن کاملA PGAS-based implementation for the unstructured CFD solver TAU
Whereas most applications in the realm of the partitioned global address space make use of PGAS languages we here demonstrate an implementation on top of a PGAS-API. In order to improve the scalability of the unstructured CFD solver TAU we have implemented an asynchronous communication strategy on top of the PGAS-API of GPI. We have replaced the bulk-synchronous two-sided MPI exchange with an a...
متن کاملA scalable replay-based infrastructure for the performance analysis of one-sided communication
Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still...
متن کامل