بستر cuda

Cuda Accelerated Ltl Model Checking Cuda Accelerated Ltl Model Checking *

2009

Jiří Barnat Luboš Brim Milan Češka

Recent technological developments made available various many-core hardware platforms. For example, a SIMD-like hardware architecture became easily accessible for many users who have their computers equipped with modern NVIDIA GPU cards with CUDA technology. In this paper we redesign the maximal accepting predecessors algorithm [7] for LTL model checking in terms of matrix-vector product in ord...

متن کامل

شناسایی چهره با استفاده از الگوی دودوئی محلی ترکیبی برپایه پردازنده گرافیکی جهت تسریع امر شناسایی افراد در پایگاه‌های نظامی

ژورنال: علوم و فناوری دریا 2019

کامبیز طباطبائی اردکانی, کمال میرزائی,

با‌‌توجه به محبوبیت و استفاده روز‌‌‌افزون از وسایل دیجیتال در زندگی روزمره بشر و همچنین گسترش به اشتراک‌گذاری تصاویر در شبکه‌های اجتماعی همچون فیس‌بوک، فلیکر، اینستاگرام و غیره و همچنین بارگذاری فیلم‌های مختلف در این شبکه‌ها، استفاده از تصاویر دیجیتال مخصوصا در دهه اخیر رشد قابل توجهی داشته‌است که در میان این تصاویر، درصد بالایی مربوط به تصاویر چهره انسان است و در مواردی از قبیل پایش تصویر برخط...

متن کامل

Asynchronous Parallel Computing Algorithm implemented in 1D Heat Equation with \textsf{CUDA}

2015

Kooktae Lee Raktim Bhattacharya

In this note, we present the stability as well as performance analysis of asynchronous parallel computing algorithm implemented in 1D heat equation with CUDA. The primary objective of this note lies in dissemination of asynchronous parallel computing algorithm by providing CUDA code for fast and easy implementation. We show that the simulations carried out on nVIDIA GPU device with asynchronous...

متن کامل

CPU AND GPU (CUDA) TEMPLATE MATCHING COMPARISON / CPU IR GPU (CUDA) PALYGINIMAS VYKDANT ŠABLONŲ ATITIKTIES ALGORITMĄ

Journal: :Mokslas – Lietuvos ateitis 2014

متن کامل

Parallel Prefix Scan with Compute Unified Device Architecture (cuda)

2014

B. MUNI LAVANYA

Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and procee...

متن کامل

15-740 Project Milestone Report

2007

Xi Liu Fan Guo

The ideal choices for the tasks presented in the project proposal would be the NVIDIA CUDA toolkit (http://developer.nvidia.com/object/cuda. html), which exposes more underlying architecture to programmers. However, the package requires a capable NVIDIA video card, and we could not get for this project. ATI also designed a similar platform “Close-to-Metal (CTM) Device” (http://ati.de/companyinf...

متن کامل

Spoc: GPGPU Programming through Stream Processing with OCaml

Journal: :Parallel Processing Letters 2012

Mathias Bourgoin Emmanuel Chailloux Jean Luc Lamotte

ions Skeletons and Composition : Tomorrow 4:30pm OpenGPU workshop DSL Embedded language to express kernel Real World Use Case 2DRMP : Dimensional R-matrix propagation (Computer Physics Communications) Simulates electron scattering from H-like atoms and ions at intermediate energies Multi-Architecture: MultiCore, GPGPU, Clusters, GPU Clusters Translate from Fortran + Cuda to OCaml+SPOC + Cuda/Op...

متن کامل

Random number generators for massively parallel simulations on GPU

2012

Markus Manssen Martin Weigel Alexander K. Hartmann

High-performance streams of (pseudo) random numbers are crucial for the efficient implementation for countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA support...

متن کامل

GPU Acceleration for Particle Filter based LDPC Decoding

2009

Shuang Wang Lijuan Cui Samuel Cheng Robert C. Huck

A parallel belief propagation algorithm based on Particle Filtering (PF) for channel estimation and Low-Density Parity-Check (LDPC) decoding is presented in this paper based on Compute Unified Device Architecture (CUDA). The authors have found that compared with the traditional Belief Propagation (BP) algorithm with fixed estimated noise power, BP algorithm based on PF [1] not only gives a good...

متن کامل

Effects of Easy Hybrid Parallelization with CUDA for OpenMX

2014

Jae-Hyeon Parq Erik Sevre Sang-Mook Lee

A MPI-friendly density functional theory (DFT) source code was modified within hybrid parallelization including CUDA. The objective is to find out how simple conversions within the hybrid parallelization with mid-range GPUs affect DFT code not originally suitable to CUDA. Several rules of hybrid parallelization for numerical-atomic-orbital (NAO) DFT codes were settled. The test was performed on...

متن کامل