multi gpu

نتایج جستجو برای: multi gpu

تعداد نتایج: 473736 فیلتر نتایج به سال:

Cluster-SkePU: A Multi-Backend Skeleton Programming Library for GPU Clusters

2013

Mudassar Majeed Usman Dastgeer

SkePU is a C++ template library with a simple and unified interface for expressing data parallel computations in terms of generic components, called skeletons, on multi-GPU systems using CUDA and OpenCL. The smart containers in SkePU, such as Matrix and Vector, perform data management with a lazy memory copying mechanism that reduces redundant data communication. SkePU provides programmability,...

متن کامل

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

2012

Hoang-Vu Dang Bertil Schmidt

Existing formats for SparseMatrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse m...

متن کامل

Importance of Explicit Vectorization for CPU and GPU Software Performance

Journal: :J. Comput. Physics 2011

Neil G. Dickson Kamran Karimi Firas Hamze

Much of the current focus in high-performance computing is on multi-threading, multi-computing, and graphics processing unit (GPU) computing. However, vectorization and non-parallel optimization techniques, which can often be employed additionally, are less frequently discussed. In this paper, we present an analysis of several optimizations done on both central processing unit (CPU) and GPU imp...

متن کامل

Efficient Implementation of MrBayes on Multi-GPU

Journal: :Molecular Biology and Evolution 2013

متن کامل

GPU-Chariot: A Programming Framework for Stream Applications Running on Multi-GPU Systems

Journal: :IEICE Transactions on Information and Systems 2013

متن کامل

Separable 2D Convolution with Polymorphic Register Files

2013

Catalin Bogdan Ciobanu Georgi Gaydadjiev

This paper studies the performance of separable 2D convolution on multi-lane Polymorphic Register Files (PRFs). We present a matrix transposition algorithm optimized for PRFs, and a 2D vectorized convolution algorithm which avoids strided memory accesses. We compare the throughput of our PRF to the nVidia Tesla C2050 GPU. The results show that even in bandwidth constrained systems, multi-lane P...

متن کامل

3D Registration Based on Normalized Mutual Information: Performance of CPU vs. GPU Implementation

2010

Florian Jung Stefan Wesarg F. Jung S. Wesarg

Medical image registration is time-consuming but can be sped up employing parallel processing on the GPU. Normalized mutual information (NMI) is a well performing similarity measure for performing multi-modal registration. We present CUDA based solutions for computing NMI on the GPU and compare the results obtained by rigidly registering multi-modal data sets with a CPU based implementation. Ou...

متن کامل

Modelling Multi-GPU Systems

2009

Daniele G. Spampinato Anne C. Elster Thorvald Natvig

Due to the power and frequency walls, the trend is now to use multiple GPUs on a given system, much like you will find multiple cores on CPU-based systems. However, increasing the hierarchy of resource widens the spectrum of factors that may impact on the performance of the system. The goal of this paper is to analyze such factors by investigating and benchmarking the NVIDIA Tesla S1070. This s...

متن کامل

Evaluation of Two Parallel Finite Element Implementations of the Time-Dependent Advection Diffusion Problem: GPU versus Cluster Considering Time and Energy Consumption

2012

Alberto Ferreira de Souza Lucas de Paula Veronese Leonardo M. Lima Claudine Badue Lucia Catabriga

We analyze two parallel finite element implementations of the 2D time-dependent advection diffusion problem, one for multi-core clusters and one for CUDA-enabled GPUs, and compare their performances in terms of time and energy consumption. The parallel CUDA-enabled GPU implementation was derived from the multi-core cluster version. Our experimental results show that a desktop machine with a sin...

متن کامل

Multi-GPU-based Swendsen-Wang multi-cluster algorithm with reduced data traffic

Journal: :Computer Physics Communications 2015

Yukihiro Komura

The computational performance of multi-GPU applications can be degraded by the data communication between each GPU. To realize high-speed computation with multiple GPUs, we should minimize the cost of this data communication. In this paper, I propose a multiple GPU computing method for the Swendsen–Wang (SW) multi-cluster algorithm that reduces the data traffic between each GPU. I realize this ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید