بستر cuda

نتایج جستجو برای: بستر cuda

تعداد نتایج: 19735 فیلتر نتایج به سال:

Curve-Fitting on Graphics Processors Using Particle Swarm Optimization

Journal: :Int. J. Computational Intelligence Systems 2014

R. T. Kneusel

Curve fitting is a fundamental task in many research fields. In this paper we present results demonstrating the fitting of 2D images using CUDA (compute unified device architecture) on NVIDIA graphics processors via particle swarm optimization (PSO). Particle swarm optimization is particularly well-suited to implementation on graphics processors using CUDA as each CUDA thread can be made to mod...

متن کامل

Designing fast LTL model checking algorithms for many-core GPUs

Journal: :J. Parallel Distrib. Comput. 2012

Jiri Barnat Petr Bauch Lubos Brim Milan Ceska

Recent technological developments made various many-core hardware platforms widely accessible. These massively parallel architectures have been used to significantly accelerate many computation demanding tasks. In this paper we show how the algorithms for LTL model checking can be redesigned in order to accelerate LTL model checking on many-core GPU platforms. Our detailed experimental evaluati...

متن کامل

Simple sorting algorithm test based on CUDA

Journal: :CoRR 2015

Hongyu Meng Fangjin Guo

With the development of computing technology, CUDA has become a very important tool. In computer programming, sorting algorithm is widely used. There are many simple sorting algorithms such as enumeration sort, bubble sort and merge sort. In this paper, we test some simple sorting algorithm based on CUDA and draw some useful conclusions.

متن کامل

Selected Topics in Modern Scientific Computing

2013

Martin Köhler

GPU Computing: CUDA vs. OpenCL Currently there is no clear standard for the programming model in applications involving graphics processing units (GPUs). Nvidia as one of the most important hardware manufacturers is pushing their C language extension CUDA, while AMD/ATI as their competitor is following the general OpenCL framework that in principle allows to be applied for arbitrary accelerator...

متن کامل

A CUDA-Based Real Parameter Optimization Benchmark

Journal: :CoRR 2014

Ke Ding Ying Tan

Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC’14.

متن کامل

Lattice Simulations using OpenACC compilers

Journal: :CoRR 2013

Pushan Majumdar

OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA codes. Programs can be modified incrementally using OpenMP like directives which causes the compiler to generate CUDA kernels to be run on the GPUs. In this article we look at the performance gain in lattice simulations with dynamical fermions using OpenACC compilers.

متن کامل

Extending OmpSs to support CUDA and OpenCL in C, C++ and Fortran Applications

2014

Florentino Sainz Sergi Mateo Vicenç Beltran Jose L. Bosque Eduard Ayguadé

CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumberso...

متن کامل

Accelerating Haze Removal Algorithm Using CUDA

Journal: :Remote Sensing 2020

متن کامل

C2CU : A CUDA C Program Generator for Bulk Execution of a Sequential Algorithm

Journal: :Concurrency and Computation: Practice and Experience 2014

Daisuke Takafuji Koji Nakano Yasuaki Ito

We present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. C2CU: A CUDA C Program Generator for Bulk Execution

متن کامل

A Multi-Stage CUDA Kernel for Floyd-Warshall

Journal: :CoRR 2010

Ben D. Lund Justin W. Smith

We present a new implementation of the Floyd-Warshall AllPairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید