A Parallel 3-D FFT Algorithm on Clusters of Vector SMPs
نویسنده
چکیده
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The three-dimensional FFT algorithm can be altered into a multirow FFT algorithm to expand the innermost loop length. We use the multirow FFT algorithm to implement the parallel three-dimensional FFT algorithm. Performance results of three-dimensional power-of-two FFTs on clusters of (pseudo) vector SMP nodes, Hitachi SR8000, are reported. We succeeded in obtaining performance of about 40 GFLOPS on a 16-node Hitachi SR8000.
منابع مشابه
Modeling Cone-Beam Tomographic Reconstruction Using LogSMP: An Extenced LogP Model for Clusters of SMPs
The tomographic reconstruction for cone-beam geometries is a computationally intensive task requiring large memory and computational power to investigate interesting objects. The analysis of its parallel implementation on widely available clusters of SMPs requires an extension of the original LogP model to account for the various communication channels, called LogSMP. The LogSMP model is used i...
متن کاملModeling and Simulative Performance Analysis of SMP and Clustered Computer Architectures
The performance characteristics of several classes of parallel computing systems are analyzed and compared using high-fidelity modeling and execution-driven simulation. Processor, bus, and network models are used to construct and simulate the architectures of symmetric multiprocessors (SMPs), clusters of uniprocessors, and clusters of SMPs. To demonstrate a typical use of the models, the perfor...
متن کاملTechnische Universität Chemnitz Sonderforschungsbereich 393 Numerische Simulation auf massiv parallelen Rechnern
The characteristics of irregular algorithms make a parallel implementation difficult, especially for PC clusters or clusters of SMPs. These characteristics may include an unpredictable access behavior to dynamically changing data structures or strong irregular coupling of computations. Problems are an unknown load distribution and expensive irregular communication patterns for data accesses and...
متن کاملCombining building blocks for parallel multi-level matrix multiplication
EXTENDED ABSTRACT Matrix-matrix multiplication is one of the core computations in many algorithms from scientific computing or numerical analysis and many efficient realizations have been invented over the years, including many parallel ones. The current trend to use clusters of PCs or SMPs for scientific computing suggests to revisit matrix-matrix multiplication and investigate efficiency and ...
متن کاملPerformance Analysis of Algorithms on Shared Memory, Message passing and Hybrid Models for Standalone and Clustered SMPs
While algorithms are well-understood in its sequential form, comparatively little would be known of how to implement parallel algorithms with main-stream parallel programming platforms and run it on SMP-based mainstream systems such as multi-core clusters. The project aims at better understanding the algorithmic techniques like divide and conquer, decrease and conquer, transform and conquer par...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000