Overlapping computation and communication of three-dimensional FDTD on a GPU cluster
نویسندگان
چکیده
Large-scale electromagnetic field simulations using the FDTD (finite-difference time-domain) method require the use of GPU (graphics processing unit) clusters. However, the communication overhead caused by slow interconnections becomes a major performance bottleneck. In this paper, as a way to remove the bottleneck,wepropose the ‘kernel-splitmethod’ and the ‘host-buffermethod’which overlap computation and communication for the FDTD simulation on the GPU cluster. The host-buffer method in particular enables overlapping without any modifications to the update-kernels that are already in use. We also present theoretical formulas to predict the overlap threshold and the total throughput for each method. By using our overlap methods with 6 GPU nodes, we demonstrate that the total performance of 3D FDTD reaches 92% of a six-fold increase, which is the upper limit that would be reached if there were no communication overhead. © 2012 Elsevier B.V. All rights reserved.
منابع مشابه
A Novel Scheme for High Performance Finite-Difference Time-Domain (FDTD) Computations Based on GPU
Finite-Difference Time-Domain (FDTD) has been proved to be a very useful computational electromagnetic algorithm. However, the scheme based on traditional general purpose processors can be computationally prohibitive and require thousands of CPU hours, which hinders the large-scale application of FDTD. With rapid progress on GPU hardware capability and its programmability, we propose in this pa...
متن کاملComputation-Communication Overlap of Linpack on a GPU-Accelerated PC Cluster
In this paper, we propose an approach to obtaining enhanced performance of the Linpack benchmark on a GPU-accelerated PC cluster connected via relatively slow inter-node connections. For one node with a quad-core Intel Xeon W3520 processor and a NVIDIA Tesla C1060 GPU card, we implement a CPU–GPU parallel double-precision general matrix–matrix multiplication (dgemm) operation, and achieve a per...
متن کاملMulti-GPU-based Swendsen-Wang multi-cluster algorithm with reduced data traffic
The computational performance of multi-GPU applications can be degraded by the data communication between each GPU. To realize high-speed computation with multiple GPUs, we should minimize the cost of this data communication. In this paper, I propose a multiple GPU computing method for the Swendsen–Wang (SW) multi-cluster algorithm that reduces the data traffic between each GPU. I realize this ...
متن کاملکاربرد روش معادله سهموی در تحلیل مسائل انتشار امواج داخل ساختمان
With the rapid growth of indoor wireless communication systems, the need to accurately model radio wave propagation inside the building environments has increased. Many site-specific methods have been proposed for modeling indoor radio channels. Among these methods, the ray tracing algorithm and the finite-difference time domain (FDTD) method are the most popular ones. The ray tracing approach ...
متن کاملScalable lattice Boltzmann solvers for CUDA GPU clusters
The lattice Boltzmann method (LBM) is an innovative and promising approach in computational fluid dynamics. From an algorithmic standpoint it reduces to a regular data parallel procedure and is therefore well-suited to high performance computations. Numerous works report efficient implementations of the LBM for the GPU, but very few mention multi-GPU versions and even fewer GPU cluster implemen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Physics Communications
دوره 183 شماره
صفحات -
تاریخ انتشار 2012