نتایج جستجو برای: multi gpu
تعداد نتایج: 473736 فیلتر نتایج به سال:
Matrix multiplication (MM) is one of the core problems in the high performance computing domain and its efficiency impacts performances of almost all matrix problems. The high-density multi-GPU architecture escalates the complexities of such classical problem, though it greatly exceeds the capacities of previous homogeneous multicore architectures. In order to fully exploit the potential of suc...
Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This couple...
GPUs exploit a high degree of thread-level parallelism to efficiently hide long-latency stalls. Thanks to their latencyhiding abilities and continued improvements in programmability, GPUs are becoming a more essential computational resource. Due to the heterogeneous compute requirements of different applications, there is a growing need to share the GPU across multiple applications in large-sca...
The Finite-Difference Time-Domain (FDTD) method is applied to the analysis of vibroacoustic problems and to study the propagation of longitudinal and transversal waves in a stratified media. The potential of the scheme and the relevance of each acceleration strategy for massively computations in FDTD are demonstrated in this work. In this paper, we propose two new specific implementations of th...
Random Forests are a popular and powerful machine learning technique, with several fast multi-core CPU implementations. Since many other machine learning methods have seen impressive speedups from GPU implementations, applying GPU acceleration to random forests seems like a natural fit. Previous attempts to use GPUs have relied on coarse-grained task parallelism and have yielded inconclusive or...
Massively parallel architectures such as the GPU are becoming increasingly important due to the recent proliferation of data. In this paper, we propose a key class of hybrid parallel graphlet algorithms that leverages multiple CPUs and GPUs simultaneously for computing k-vertex induced subgraph statistics (called graphlets). In addition to the hybrid multi-core CPU-GPU framework, we also invest...
As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selec...
We present a modified GPU (graphics processing unit) version of MrBayes, called ta(MC)(3) (GPU MrBayes V3.1), for Bayesian phylogenetic inference on protein data sets. Our main contributions are 1) utilizing 64-bit variables, thereby enabling ta(MC)(3) to process larger data sets than MrBayes; and 2) to use Kahan summation to improve accuracy, convergence rates, and consequently runtime. Versus...
The emergence and continuing use of multi-core architectures and graphics processing units require changes in the existing software and sometimes even a redesign of the established algorithms in order to take advantage of now prevailing parallelism. Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) and Matrix Algebra on GPU and Multics Architectures (MAGMA ) are two project...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید