بستر cuda

Solutions for Optimizing the Stream Compaction Algorithmic Function Using the Compute Unified Device Architecture

2012

Alexandru Pîrjan

In this paper, I have researched and developed solutions for optimizing the stream compaction algorithmic function using the Compute Unified Device Architecture (CUDA). The stream compaction is a common parallel primitive, an essential building block for many data processing algorithms, whose optimization improves the performance of a wide class of parallel algorithms useful in data processing....

متن کامل

Accelerating Smith-Waterman Alignment for Protein Database Search Using Frequency Distance Filtration Scheme Based on CPU-GPU Collaborative System

2015

Yu Liu Yang Hong Chun-Yuan Lin Che-Lun Hung

The Smith-Waterman (SW) algorithm has been widely utilized for searching biological sequence databases in bioinformatics. Recently, several works have adopted the graphic card with Graphic Processing Units (GPUs) and their associated CUDA model to enhance the performance of SW computations. However, these works mainly focused on the protein database search by using the intertask parallelization...

متن کامل

Image Encryption Using Parallel RSA Algorithm on CUDA

2014

Vaibhav Tuteja

In this paper we discuss Image Encryption and Decryption using RSA Algorithm which was earlier used for text encryption. In today’s era it is a crucial concern that proper encryption decryption should be applied so that unauthorized access can be prevented. We intend to build a general RSA algorithm which can be combined with other image processing techniques to provide new methodologies and be...

متن کامل

On Benchmarking the Matrix Multiplication Algorithm using OpenMP, MPI and CUDA Programming Languages

2013

Muhammed Al-Mulhem Abdulah AlDhamin Raed Al-Shaikh

Parallel programming languages represent a common theme in the evolution of high performance computing (HPC) systems. There are several parallel programming languages that are directly associated with different HPC systems. In this paper, we compare the performance of three commonly used parallel programming languages, namely: OpenMP, MPI and CUDA. Our performance evaluation of these languages ...

متن کامل

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

2012

Ahmad Abdelfattah Jack J. Dongarra David E. Keyes Hatem Ltaief

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized n...

متن کامل

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

2012

Ahmad Abdelfattah Jack Dongarra David Keyes Hatem Ltaief

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical k...

متن کامل

Parallelization of an Iterative Method for Solving Large and Sparse Linear Systems using the CUDA-Matlab Integration

2014

Lauro Cássio Martins de Paula Anderson da Silva Soares

This paper presents a parallel implementation of the Hybrid Bi-Conjugate Gradient Stabilized (BiCGStab(2)) iterative method in a Graphics Processing Unit (GPU) for solution of large and sparse linear systems. This implementation uses the CUDA-Matlab integration, in which the method operations are performed in a GPU cores using Matlab built-in functions. The goal is to show that the exploitation...

متن کامل

Accelerating GOR Algorithm Using CUDA

2013

Xinbiao Gan Cong liu Zhiying Wang Li Shen Qi Zhu Jie Liu Lihua Chi Yihui Yan Bin Yu

Protein secondary structure prediction is very important for its molecular structure. GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the running time is unbearable with sharp growth in protein database. Fortunately, CUDA (Compute Unified Device Architecture) p...

متن کامل

A new approach to the lattice Boltzmann method for graphics processing units

Journal: :Computers & Mathematics with Applications 2011

Christian Obrecht Frédéric Kuznik Bernard Tourancheau Jean-Jacques Roux

Emerging many-core processors, like CUDA capable nVidia GPUs, are promising platforms for regular parallel algorithms such as the Lattice Boltzmann Method (LBM). Since global memory on graphic devices shows high latency and LBM is data intensive, memory access pattern is an important issue to achieve good performances. Whenever possible, global memory loads and stores should be coalescent and a...

متن کامل

Improving the performance of the linear systems solvers using CUDA

Journal: :CoRR 2013

Bogdan Oancea Tudorel Andrei Raluca Mariana Dragoescu

Parallel computing can offer an enormous advantage regarding the performance for very large applications in almost any field: scientific computing, computer vision, databases, data mining, and economics. GPUs are high performance many-core processors that can obtain very high FLOP rates. Since the first idea of using GPU for general purpose computing, things have evolved and now there are sever...

متن کامل