بستر cuda

نتایج جستجو برای: بستر cuda

تعداد نتایج: 19735 فیلتر نتایج به سال:

CUDA based Parallel Derivation of Parametric L-system

2011

Ji Liu Sulan Zhang Lingqiu Zeng Qingsheng Zhu Songyang Li

This paper proposes an approach to derive a parametric L-system in parallel based on Compute Unified Device Architecture (CUDA). It consists of a host program running on CPU and a device program running on CUDA enabled GPU. The host program is used to transfer data between CPU and GPU, pre-allocate host and device memory, and launch the device program. The device program takes charge of derivin...

متن کامل

Efficient CUDA

2017

CODY COLEMAN DANIEL KANG

Recent work in deep learning has created a voracious demand for more compute cycles, but with the slowdown of Moore’s law, CPUs cannot keep up with the demand. Thus, attention has turned to massively-parallel hardware accelerators, ranging from new processing units designed specifically for deep learning to the repurposing of existing technologies like Fieldprogrammable gate arrays (FPGAs) and ...

متن کامل

Accelerating GOR Algorithm Using CUDA

Journal: :Applied Mathematics & Information Sciences 2013

متن کامل

An Implementation of Ray Tracing in CUDA

2009

Liang Chen Hirakendu Das Shengjun Pan

In computer graphics, ray tracing is a popular technique for rendering images with computers. In this project we implemented a serial version of ray tracing in C, and three parallelized versions in CUDA. We conducted experiments to demonstrate speedup with CUDA, as well as the importance of balancing workload among threads.

متن کامل

Stereoscopic video chroma key processing using NVIDIA CUDA

Journal: :Annales UMCS, Informatica 2013

Jaroslaw Sagan

In this paper, I use the NVIDIA CUDA technology to perform the chroma key algorithm on stereoscopic images. NVIDIA CUDA allows to process parallel algorithms on GPU. Input data are stereoscopic images with the monochromatic background and the destination background image. Output data is the combination of inputs by using the chroma key. I compare the algorithm efficiency between the GPU and CPU...

متن کامل

Performance modeling in CUDA streams - A means for high-throughput data processing

Journal: :Proceedings : ... IEEE International Conference on Big Data. IEEE International Conference on Big Data 2014

Hao Li Di Yu Anand Kumar Yi-Cheng Tu

Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal...

متن کامل

ACC-SVM: Accelerating SVM on GPUs using OpenACC

2016

Rengan Xu Dounia Khaldi Abid M. Malik Barbara Chapman

GPUs have been successfully applied in scientific computing in the last decade. Many machine learning algorithms have also used GPUs to accelerate their computations. This includes the Support Vector Machine (SVM) which is a classical machine learning algorithm that has been successfully used in many applications such as text classification and image recognition. There have been many open-sourc...

متن کامل

JCUDA: A Programmer-Friendly Interface for Accelerating Java Programs with CUDA

2009

Yonghong Yan Max Grossman Vivek Sarkar

A recent trend in mainstream desktop systems is the use of general-purpose graphics processor units (GPGPUs) to obtain order-ofmagnitude performance improvements. CUDA has emerged as a popular programming model for GPGPUs for use by C/C++ programmers. Given the widespread use of modern object-oriented languages with managed runtimes like Java and C#, it is natural to explore how CUDA-like capab...

متن کامل

A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs

2014

C. R. Barde

Sparse Matrix is a matrix consisting of very few non-zero entries. Large sparse matrices are often used in engineering and scientific operations. Especially sparse-matrix vector multiplication is an important operation for solving linear system and partial differential equations. However, there is a possibility that even though the matrix is partitioned and stored appropriately, the performance...

متن کامل

Dynamic Task Parallelism with a GPU Work-Stealing Runtime System

2011

Sanjay Chatterjee Max Grossman Alina Simion Sbîrlea Vivek Sarkar

NVIDIA’s Compute Unified Device Architecture (CUDA) and its attached C/C++ based API went a long way towards making GPUs more accessible to mainstream programming. So far, the use of GPUs for high performance computing has been primarily restricted to data parallel applications, and with good reason. The high number of computational cores and high memory bandwidth supported by the device makes ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید