multi gpu

Matrix Multiplication on High-Density Multi-GPU Architectures: Theoretical and Experimental Investigations

2015

Peng Zhang Yu-Xiang Gao

Matrix multiplication (MM) is one of the core problems in the high performance computing domain and its efficiency impacts performances of almost all matrix problems. The high-density multi-GPU architecture escalates the complexities of such classical problem, though it greatly exceeds the capacities of previous homogeneous multicore architectures. In order to fully exploit the potential of suc...

متن کامل

Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

Journal: :CoRR 2017

Ammar Ahmad Awan Ching-Hsiang Chu Hari Subramoni Dhabaleswar K. Panda

Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This couple...

متن کامل

Improving Multi-Application Concurrency Support Within the GPU Memory System

Journal: :CoRR 2017

Rachata Ausavarungnirun Christopher J. Rossbach Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Onur Mutlu

GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...

متن کامل

Multi-GPU and multi-CPU accelerated FDTD scheme for vibroacoustic applications

Journal: :Computer Physics Communications 2015

Jorge Francés Beatriz Otero Sergio Bleda Sergi Gallego Cristian Neipp Augusto Márquez Augusto Beléndez

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of th...

متن کامل

Learning Random Forests on the GPU

2013

Yisheng Liao Alex Rubinsteyn Russell Power Jinyang Li

Random Forests are a popular and powerful machine learning technique, with several fast multi-core CPU implementations. Since many other machine learning methods have seen impressive speedups from GPU implementations, applying GPU acceleration to random forests seems like a natural fit. Previous attempts to use GPUs have relied on coarse-grained task parallelism and have yielded inconclusive or...

متن کامل

Hybrid CPU-GPU Framework for Network Motifs

Journal: :CoRR 2016

Ryan A. Rossi Rong Zhou

Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also invest...

متن کامل

An ITK Implementation of Physics-based Non-rigid Registration Method

2012

Yixun Liu Andriy Kot Fotis Drakopoulos Andriy Fedorov Andinet Enquobahrie Olivier Clatz Nikos Chrisochoides

As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selec...

متن کامل

Multi-GPU numerical simulation of electromagnetic waves

Journal: :ESAIM: Proceedings and Surveys 2014

متن کامل

GPU MrBayes V3.1: MrBayes on Graphics Processing Units for Protein Sequence Data.

Journal: :Molecular biology and evolution 2015

Shuai Pang Rebecca J Stones Ming-Ming Ren Xiao-Guang Liu Gang Wang Hong-ju Xia Hao-Yang Wu Yang Liu Qiang Xie

We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus...

متن کامل

Numerical Linear Algebra on Emerging Architectures: the PLASMA and MAGMA Projects

2009

Emmanuel Agullo Jim Demmel Jack Dongarra Bilel Hadri Jakub Kurzak Julien Langou Hatem Ltaief Piotr Luszczek Stanimire Tomov

The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics Architectures (MAGMA ) are two project...

متن کامل