نتایج جستجو برای: بستر cuda
تعداد نتایج: 19735 فیلتر نتایج به سال:
Curve fitting is a fundamental task in many research fields. In this paper we present results demonstrating the fitting of 2D images using CUDA (compute unified device architecture) on NVIDIA graphics processors via particle swarm optimization (PSO). Particle swarm optimization is particularly well-suited to implementation on graphics processors using CUDA as each CUDA thread can be made to mod...
Recent technological developments made various many-core hardware platforms widely accessible. These massively parallel architectures have been used to significantly accelerate many computation demanding tasks. In this paper we show how the algorithms for LTL model checking can be redesigned in order to accelerate LTL model checking on many-core GPU platforms. Our detailed experimental evaluati...
With the development of computing technology, CUDA has become a very important tool. In computer programming, sorting algorithm is widely used. There are many simple sorting algorithms such as enumeration sort, bubble sort and merge sort. In this paper, we test some simple sorting algorithm based on CUDA and draw some useful conclusions.
GPU Computing: CUDA vs. OpenCL Currently there is no clear standard for the programming model in applications involving graphics processing units (GPUs). Nvidia as one of the most important hardware manufacturers is pushing their C language extension CUDA, while AMD/ATI as their competitor is following the general OpenCL framework that in principle allows to be applied for arbitrary accelerator...
Benchmarking is key for developing and comparing optimization algorithms. In this paper, a CUDA-based real parameter optimization benchmark (cuROB) is introduced. Test functions of diverse properties are included within cuROB and implemented efficiently with CUDA. Speedup of one order of magnitude can be achieved in comparison with CPU-based benchmark of CEC’14.
OpenACC compilers allow one to use Graphics Processing Units without having to write explicit CUDA codes. Programs can be modified incrementally using OpenMP like directives which causes the compiler to generate CUDA kernels to be run on the GPUs. In this article we look at the performance gain in lattice simulations with dynamical fermions using OpenACC compilers.
CUDA and OpenCL are the most widely used programming models to exploit hardware accelerators. Both programming models provide a C-based programming language to write accelerator kernels and a host API used to glue the host and kernel parts. Although this model is a clear improvement over a low-level and ad-hoc programming model for each hardware accelerator, it is still too complex and cumberso...
We present a time-optimal implementation for bulk execution of an oblivious sequential algorithm. Our second contribution is to develop a tool, named C2CU, which automatically generates a CUDA C program for a bulk execution of an oblivious sequential algorithm. C2CU: A CUDA C Program Generator for Bulk Execution
We present a new implementation of the Floyd-Warshall AllPairs Shortest Paths algorithm on CUDA. Our algorithm runs approximately 5 times faster than the previously best reported algorithm. In order to achieve this speedup, we applied a new technique to reduce usage of on-chip shared memory and allow the CUDA scheduler to more effectively hide instruction latency.
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید