Efficient Data Parallel Implementations of Highly Irregular Problems
نویسنده
چکیده
This dissertation presents optimization techniques for efficient data parallel formulation/implementation of highly irregular problems, and applies the techniques to O(N) hierarchical N–body methods for large–scale N–body simulations. It demonstrates that highly irregular scientific and engineering problems such as nonadaptive and adaptive O(N) hierarchical N–body methods can be efficiently implemented in high–level data parallel languages such as High Performance Fortran (HPF) on scalable parallel architectures. It also presents an empirical study of the accuracy–cost tradeoffs of O(N) hierarchical N–body methods. This dissertation first develops optimization techniques for efficient data parallel implementation of irregular problems, focusing on minimizing the data movement through careful management of the data distribution and the data references, both between the memories of different nodes, and within the memory hierarchy of each node. For hierarchical N–body methods, our optimizations on improving arithmetic efficiencies include recognizing dominating computations as matrix–vector multiplications and aggregating them into multiple–instance matrix–matrix multiplications. Experimental results with an implementation in Connection Machine Fortran of Anderson’s hierarchical N–body method demonstrate that performance competitive to that of the best message–passing implementations of the same class of methods can be achieved. The dissertation also presents a general data parallel formulation for highly irregular applications, and applies the formulation to an adaptive hierarchicalN–body method with highly nonuniform particle distributions. The formulation consists of (1) a method for linearizing irregular data structures, (2) a data parallel implementation (in HPF) of graph partitioning algorithms applied to the linearized data structure, and (3) techniques for expressing irregular communications and nonuniform computations associated with the elements of linearized data structures. Experimental results demonstrate that efficient data parallel (HPF) implementations of highly nonuniform problems are feasible with proper language/compiler/runtime support. Our data parallel N–body code provides a much needed “benchmark” code for evaluating and improving HPF compilers. This thesis also develops the first data parallel (HPF) implementation of the geometric partitioning algorithm due to Miller, Teng, Thurston and Vavasis – one of the only two provably good partitioning schemes. Our data parallel formulation makes extensive use of segmented prefix sums and parallel selections, and provides a data parallel procedure for geometric sampling. Experiments on partitioning particles for load–balance and data interactions as required in hierarchical N–body algorithms show that the geometric partitioning algorithm has an efficient data
منابع مشابه
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multipr...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملDesigning irregular parallel algorithms with mutual exclusion and lock-free protocols
Irregular parallel algorithms pose a significant challenge for achieving high performance because of the difficulty predicting memory access patterns or execution paths. Within an irregular application, fine-grained synchronization is one technique for managing the coordination of work; but in practice the actual performance for irregular problems depends on the input, the access pattern to sha...
متن کاملSome GPU Algorithms for Graph Connected Components and Spanning Tree
Graphics Processing Units (GPU) are application specific accelerators which provide high performance to cost ratio and are widely available and used, hence places them as a ubiquitous accelerator. A computing paradigm based on the same is the general purpose computing on the GPU (GPGPU) model. The GPU due to its graphics lineage is better suited for the data-parallel, data-regular algorithms. T...
متن کاملAn Efficient Approach on Spatial Big Data Related to Wireless Networks and Its Applications
Spatial big data acts as a important key role in wireless networks applications. In that spatial and spatio temporal problems contains the distinct role in big data and it’s compared to common relational problems. If we are solving those problems means describing the three applications for spatial big data. In each applications imposing the specific design and we are developing our work on high...
متن کامل