بستر cuda

Graph-Based Substructure Pattern Mining Using CUDA Dynamic Parallelism

2013

Fei Wang Jianqiang Dong Bo Yuan

CUDA is an advanced massively parallel computing platform that can provide high performance computing power at much more affordable cost. In this paper, we present a parallel graph-based substructure pattern mining algorithm using CUDA Dynamic Parallelism. The key contribution is a parallel solution to traversing the DFS (Depth First Search) code tree. Furthermore, we implement a parallel frequ...

متن کامل

Technical Report WM - CS - 2010 - 03 College of William & Mary Department of Computer Science WM - CS - 2010 - 03 Implementing the Dslash Operator in OpenCL

2010

Andy Kowalski Xipeng Shen

The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has been implemented as a parallel program. Today’s Graphics Processing Units (GPU) are designed to do highly parallel numerical calculations for 3D graphics rendering. This design works well with scientific applications such ...

متن کامل

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

Journal: :J. Parallel Distrib. Comput. 2012

Marc de la Asunción José Miguel Mantas Manuel Jesús Castro Díaz Enrique Domingo Fernández-Nieto

The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, . . .Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of t...

متن کامل

CU-Simulator: A Parallel Scalable Simulation Platform for Radio Channel in Wireless Sensor Networks

Journal: :Ad Hoc & Sensor Wireless Networks 2015

Liheng Jian Ying Liu Weidong Yi

Due to the computational intensive nature, the current available WSN simulators, which are based on the traditional CPU computing architecture, cannot run in a linear scalability. In this paper, we propose and set up CU-Simulator, a parallel radio channel simulator to enhance the performance for simulating data packet transmission in WSNs using NVIDIA’s CUDA-enabled GPU parallel computing archi...

متن کامل

Computing Strongly Connected Components in Parallel on

2010

Jiří Barnat Petr Bauch Luboš Brim

The problem of decomposition of a directed graph into its strongly connected components is a fundamental graph problem inherently present in many scientific and commercial applications. In this paper we show how existing parallel algorithms can be reformulated in order to be accelerated by NVIDIA CUDA technology. In particular, we design a new CUDA-aware procedure for pivot selection and we red...

متن کامل

Sparse GPU Voxelization of Yarn-Level Cloth

Journal: :Comput. Graph. Forum 2017

Jorge Lopez-Moreno David Miraut Gabriel Cirio Miguel A. Otaduy

glGenBuffers ( 1 , &modelMatrixbuffer ) ; glBindBuffer (GL_ARRAY_BUFFER , modelMatrixbuffer ) ; glBufferData (GL_ARRAY_BUFFER , m_numberOfProfiles∗ ←↩ sizeof ( mat4 ) , NULL , GL_DYNAMIC_DRAW ) ; / / R e g i s t e r VBO wi th CUDA glBindBuffer ( GL_ARRAY_BUFFER , modelMatrixbuffer ) ; registerGLBufferObject (modelMatrixbuffer , &←↩ m_cuda_vbo_resource ) ; m_slices = (glm : : mat4 ∗)glMapBuffer ...

متن کامل

Speeding up the BioHEL evolutionary learning system using GPGPUs

2010

María A. Franco Natalio Krasnogor Jaume Bacardit

The BioHEL system is an evolutionary learning system designed to cope with large-scale datasets. This system have several characteristics focused on tackling this kind of problems, such as special representation to determine the relevant attributes in a rule, the usage of a windowing system, among others. Recently, we have extended the system to perform the rule evaluation process inside NVIDIA...

متن کامل

Mixing Graphics and Compute for Real-Time Multiview Human Body Tracking

2014

Boguslaw Rymut Bogdan Kwolek

This paper presents an effective algorithm for 3D modelbased human motion tracking using a GPU-accelerated particle swarm optimization. The tracking involves configuring the 3D human model in the pose described by each particle and then rasterizing it in each camera view. In order to accelerate the calculation of the fitness function, which is the most computationally demanding operation of the...

متن کامل

Optimizing CUDA Shared Memory Usage

2015

Shuang Gao Gregory D. Peterson

CUDA shared memory is fast, on-chip storage. However, the bank conflict issue could cause a performance bottleneck. Current NVIDIA Tesla GPUs support memory bank accesses with configurable bit-widths. While this feature provides an efficient bank mapping scheme for 32-bit and 64-bit data types, it becomes trickier to solve the bank conflict problem through manual code tuning. This paper present...

متن کامل

Optimizing Sparse Matrix-vector Multiplication Based on Gpu

2012

TAO ZHANG XIANBIN XU JIN HU SHUIBING HE

In recent years, Graphics Processing Units(GPUs) have attracted the attention of many application developers as powerful massively parallel system. Computer Unified Device Architecture (CUDA) as a general purpose parallel computing architecture makes GPUs an appealing choice to solve many complex computational problems in a more efficient way. Sparse Matrix-vector Multiplication(SpMV) algorithm...

متن کامل