35 . Two level Domain Decomposition for Multi - clusters
نویسندگان
چکیده
We discuss the design of parallel algorithms to solve elliptic problems on multi-clusters computers. Multi-clusters can be seen as two-level parallel architecture machines, since communication between clusters are usually much slower than communication or access to memory within each of the clusters. We introduce special algorithms that use two levels of parallelism and match the multi-cluster architecture. Efficient parallel algorithms that rely on fast uniform communication have been extensively developed in the past: we intend to use them for parallel computation within the clusters. On top of these local parallel algorithms, new robust and parallel algorithms are needed that can work with few clusters linked by a slow communication network. We present a two level domain decomposition algorithm that uses Aitken or Steffensen acceleration procedure combined to Schwarz for the outer loop and standard parallel domain decomposition for the inner loop. We demonstrate finally the interest of our algorithm for metacomputing. We consider the design of parallel algorithms for multi-cluster architecture with few heterogeneous clusters linked by an affordable network of order 10Mb/s bandwidth. Each cluster can be a shared multiprocessors machine or an MIMD computer with a fast internal Network. The elapse time to access memory from a given processor to a given data on such architecture is then strongly dependent on the location of the datas. Fast scalable parallel algorithm for the Laplace problem with domain decomposition and/or multigrid on a uniform MIMD architecture have usually very poor efficiency on multi-cluster machine with slow inter-cluster network. On the contrary a numerically unefficient iterative domain decomposition algorithm such as the classical additive Schwarz procedure for the Laplace problem, is easy to implement, robust and scalable on multi-cluster architecture. So our goal is the design of an acceleration procedure for iterative domain decomposition analogous to additive Schwarz that increases the numerical efficiency of the basic underlined algorithm but stay easy to implement, robust and scalable on multi-clusters. The common procedure to accelerate additive Schwarz method is the introduction of a coarse-grid operator [LSFQ97]. The resulting modified Schwarz algorithms becomes numerically efficient but the coarse grid computation might be a bottle neck for the parallel processing. We adopt here a different point of view and try to extract from a finite sequence of the interfaces generated by the Schwarz iterative procedure or analogous relaxation method, an accurate prediction of the interface’s limit. We will show in simple case as finite difference approximation of Elliptic operator with con-
منابع مشابه
Multi-level parallelism for incompressible flow computations on GPU clusters
We investigate multi-level parallelism on GPU clusters with MPI-CUDA and hybrid MPI-OpenMP-CUDA parallel implementations, in which all computations are done on the GPU using CUDA. We explore efficiency and scalability of incompressible flow computations using up to 256 GPUs on a problem with approximately 17.2 billion cells. Our work addresses some of the unique issues faced when merging fine-g...
متن کاملOutput-only Modal Analysis of a Beam Via Frequency Domain Decomposition Method Using Noisy Data
The output data from a structure is the building block for output-only modal analysis. The structure response in the output data, however, is usually contaminated with noise. Naturally, the success of output-only methods in determining the modal parameters of a structure depends on noise level. In this paper, the possibility and accuracy of identifying the modal parameters of a simply supported...
متن کاملEvaluation of Parallel Simulations on Multi-core Clusters of Miscible Displacement Applications
In this work, we analyze a parallel finite element implementation for multi-core clusters of incompressible miscible displacements in porous media. For that, we compare two alternatives of MPI jobs scheduling on the clusters’ cores. One of them, named fill-up, schedules jobs to all cores of a Central Processing Unit (CPU) of a cluster before considering the next CPU, and the other, named single...
متن کاملGraph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملHigh performance domain decomposition methods on massively parallel architectures with freefem++
In this document, we present a parallel implementation in FreeFem++ of scalable two-level domain decomposition methods. Numerical studies with highly heterogeneous problems are then performed on large clusters in order to assert the performance of our code.
متن کامل