بستر cuda

A Lattice-Boltzmann solver for 3D fluid simulation on GPU

Journal: :Simulation Modelling Practice and Theory 2012

Pablo R. Rinaldi E. A. Dari Marcelo J. Vénere Alejandro Clausse

A three-dimensional Lattice-Boltzmann fluid model with nineteen discrete velocities was implemented using NVIDIA Graphic Processing Unit (GPU) programing language ‘‘Compute Unified Device Architecture’’ (CUDA). Previous LBM GPU implementations required two steps to maximize memory bandwidth due to memory access restrictions of earlier versions of CUDA toolkit and hardware capabilities. In this ...

متن کامل

A Novel Method of Parallel Computation for the Whole Scene Test in Power System

2012

Shi Jing Qi Huang Jianbo Yi

In this paper, design and implementation of a new parallel computing method based on CUDA (Compute Unified Device Architecture) platform is described in detail. The method includes algorithm of matrix fraction and partial LU decomposition that are used to support parallel computing for simulation of the whole scene test in power system. The paper describes all the steps of algorithm implementat...

متن کامل

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

2008

John A. Stratton Sam S. Stone Wen-mei W. Hwu

Abstract. CUDA is a data parallel programming model that supports several key abstractions thread blocks, hierarchical memory and barrier synchronization for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists ...

متن کامل

Effects of Easy Hybrid Parallelization with CUDA for Numerical-Atomic-Orbital Density Functional Theory Calculation

Journal: :CoRR 2014

Jae-Hyeon Parq Erik Sevre Sang-Mook Lee

We modified a MPI-friendly density functional theory (DFT) source code within hybrid parallelization including CUDA. Our objective is to find out how simple conversions within the hybrid parallelization with mid-range GPUs affect DFT code not originally suitable to CUDA. We settled several rules of hybrid parallelization for numerical-atomic-orbital (NAO) DFT codes. The test was performed on a ...

متن کامل

Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications

2009

Sara S. Baghsorkhi Matthieu Delahaye William D. Gropp Wen-mei W. Hwu

In this paper we present an analytical model to predict the performance of general purpose applications on a GPU architecture. Themodel is designed to provide performance information to an auto-tuning compiler and assist it narrow the search to the more promising implementations. This work is based on the NVIDIAGPUs using CUDA (ComputeUnified Device Architecture). We analyze each CUDA kernel an...

متن کامل

A Framework for Transparent Execution of Massively-Parallel Applications on CUDA and OpenCL

2015

Jörn Teuber Rene Weller Gabriel Zachmann

We present a novel framework for the simultaneous development for different massively parallel platforms. Currently, our framework supports CUDA and OpenCL but it can be easily adapted to other programming languages. The main idea is to provide an easy-to-use abstraction layer that encapsulates the calls of own parallel device code as well as library functions. With our framework the code has t...

متن کامل

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

2017

Gábor Dániel Balogh I. Z. Reguly Gihan R. Mudalige

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/C++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting d...

متن کامل

The GPU on the Matrix-Matrix Multiply: Performance Study and Contributions

2009

José M. Cecilia José M. García Manuel Ujaldon

Modern graphics processing units (GPUs) have been at the leading edge of increasing chip-level parallelism over the last ten years, and the CUDA programming model has recently allowed us to exploit its power across many computational domains. Within them, dense linear algebra algorithms emerge like a natural fit for CUDA and the GPU because they are usually inherently parallel and can naturally...

متن کامل

Cuda Parallelization of a 2-d Non-hydrostatic Compressible Atmospheric Model

2009

MATTHEW R. NORMAN

Computational fluid dynamics in general require large computational resources. The same is true for an atmospheric model which simulates non-hydrostatic density-stratified flow with a gravity source term. There have been many applications of CUDA to CFD problems as can be seen by the many papers on . In fact, a full-scale global atmospheric model has been parallelized for CUDA. For my graduate ...

متن کامل

An Application-Oriented Approach for Accelerating Data-Parallel Computation with Graphics Processing Unit

2008

S. Ponce J. Huang S. I. Park C. Khoury Y. Cao F. Quek W. Feng

This paper presents a novel parallelization and quantitative characterization of various optimization strategies for dataparallel computation on a graphics processing unit (GPU) using NVIDIA’s new GPU programming framework, Compute Unified Device Architecture (CUDA). CUDA is an easy-to-use development framework that has drawn the attention of many different application areas looking for dramati...

متن کامل