نتایج جستجو برای: بستر cuda
تعداد نتایج: 19735 فیلتر نتایج به سال:
In recent years CUDA has become a major architecture for multithreaded computations. Unfortunately, its potential is not yet being commonly utilized because many fundamental problems have no practical solutions for such machines. Our goal is to establish a hybrid multicore/parallel theoretical model that represents well architectures like NVIDIA CUDA, Intel Larabee, and OpenCL as well as admits...
Recently, general purpose GPU (GPGPU) programming has spread rapidly after CUDA was first introduced to write parallel programs in high-level languages for NVIDIA GPUs. While a GPU exploits data parallelism very effectively, task-level parallelism is exploited as a multi-threaded program on a multicore CPU. For such a heterogeneous platform that consists of a multicore CPU and GPU, in this pape...
We present a GPU functional simulator targeting GPGPU based on the UNISIM framework which takes unaltered NVIDIA CUDA executables as input. It simulates the native instruction set of the Tesla architecture at the functional level and generates detailed execution statistics. Simulation speed is competitive with the less-accurate CUDA emulation mode thanks to optimizations which exploit the inher...
CuPy 1 is an open-source library with NumPy syntax that increases speed by doing matrix operations on NVIDIA GPUs. It is accelerated with the CUDA platform from NVIDIA and also uses CUDA-related libraries, including cuBLAS, cuDNN, cuRAND, cuSOLVER, cuSPARSE, and NCCL, to make full use of the GPU architecture. CuPy’s interface is highly compatible with NumPy; in most cases it can be used as a dr...
This paper presents CUDA-based parallelization of implicit incompressible SPH (IISPH) on the GPU. Along with the detailed exposition of our implementation, we analyze various components involved for their costs. We show that our CUDA version achieves near linear scaling with the number of particles and is faster than the multi-core parallelized IISPH on the CPU. We also present a basic comparis...
Connected component labeling (CCL) is a task of detecting connected regions in input data, and it finds its applications in pattern recognition, computer vision, and image processing. We present a new algorithm for connected component labeling in 2-D images implemented in CUDA. We first provide a brief overview of the CCL problem together with existing CPU-oriented algorithms. The rest of the c...
Abstract A forward model characterizes the relationship between source and a measurement in computational form. In TMS, such models are used for planning targeting stimulation. The contribution of head volume conductor to is typically solved using BEM or FEM. Here we consider context real-time TMS navigation performance benefits C++ GPU (Cuda) code over MATLAB. MATLAB highly efficient matrix co...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید