نتایج جستجو برای: multi gpu

تعداد نتایج: 473736  

2015
Peng Zhang Yu-Xiang Gao

Matrix multiplication (MM) is one of the core problems in the high performance computing domain and its efficiency impacts performances of almost all matrix problems. The high-density multi-GPU architecture escalates the complexities of such classical problem, though it greatly exceeds the capacities of previous homogeneous multicore architectures. In order to fully exploit the potential of suc...

Journal: :CoRR 2017
Ammar Ahmad Awan Ching-Hsiang Chu Hari Subramoni Dhabaleswar K. Panda

Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This couple...

Journal: :CoRR 2017
Rachata Ausavarungnirun Christopher J. Rossbach Vance Miller Joshua Landgraf Saugata Ghose Jayneel Gandhi Adwait Jog Onur Mutlu

GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...

Journal: :Computer Physics Communications 2015
Jorge Francés Beatriz Otero Sergio Bleda Sergi Gallego Cristian Neipp Augusto Márquez Augusto Beléndez

The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of th...

2013
Yisheng Liao Alex Rubinsteyn Russell Power Jinyang Li

Random Forests are a popular and powerful machine learning technique, with several fast multi-core CPU implementations. Since many other machine learning methods have seen impressive speedups from GPU implementations, applying GPU acceleration to random forests seems like a natural fit. Previous attempts to use GPUs have relied on coarse-grained task parallelism and have yielded inconclusive or...

Journal: :CoRR 2016
Ryan A. Rossi Rong Zhou

Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also invest...

2012
Yixun Liu Andriy Kot Fotis Drakopoulos Andriy Fedorov Andinet Enquobahrie Olivier Clatz Nikos Chrisochoides

As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selec...

Journal: :Molecular biology and evolution 2015
Shuai Pang Rebecca J Stones Ming-Ming Ren Xiao-Guang Liu Gang Wang Hong-ju Xia Hao-Yang Wu Yang Liu Qiang Xie

We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus...

2009
Emmanuel Agullo Jim Demmel Jack Dongarra Bilel Hadri Jakub Kurzak Julien Langou Hatem Ltaief Piotr Luszczek Stanimire Tomov

The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics Architectures (MAGMA ) are two project...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید