Technical Report WM - CS - 2010 - 03 College of William & Mary Department of Computer Science WM - CS - 2010 - 03 Implementing the Dslash Operator in OpenCL

نویسندگان

  • Andy Kowalski
  • Xipeng Shen
چکیده

The Dslash operator is used in Lattice Quantum Chromodymamics (LQCD) applications to implement a Wilson-Dirac sparse matrix-vector product. Typically the Dslash operation has been implemented as a parallel program. Today’s Graphics Processing Units (GPU) are designed to do highly parallel numerical calculations for 3D graphics rendering. This design works well with scientific applications such as LQCD’s implementation of the Dslash operator. The Scientific Computing group at the Thomas Jefferson National Accelerator Facility (Jefferson Lab) has implemented the Dslash operator for execution on GPUs using NVIDIA’s Compute Unified Device Architecture (CUDA). CUDA applications, however, will only run on NVIDIA hardware. OpenCL (Open Computing Language) is a new open standard for developing parallel programs across CPUs, GPUs and other processors. This paper describes the implementation of the Dslash operator using OpenCL (Open Computing Language), its performance on NVIDIA GPUs compared with CUDA, and its performance on other hardware platforms. General Terms Performance, Languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical Report WM - CS - 2014 - 03 College of William & Mary Department of Computer Science WM - CS - 2014 - 03 Enhancing the PRIMME Eigensolver for Computing Accurately Singular Triplets of Large Matrices

The computation of a few singular triplets of large, sparse matrices is a challenging task, especially when the smallest magnitude singular values are needed in high accuracy. Most recent efforts try to address this problem through variations of the Lanczos bidiagonalization method, but algorithmic research is ongoing and without production level software. We show that a more efficient, robust,...

متن کامل

Technical Report WM - CS - 2006 - 08 College of William & Mary Department of Computer Science WM - CS - 2006 - 08 PRIMME : PReconditioned Iterative MultiMethod Eigensolver : Methods and software description

This paper describes the PRIMME software package for the solving large, sparse Hermitian and real symmetric eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. On the other hand, the complexity of tuning or even using such methods has kept them outside the re...

متن کامل

Technical Report WM - CS - 2006 - 08 College of William & Mary Department of Computer Science WM - CS - 2006 - 08 PRIMME : PReconditioned Iterative

This paper describes the PRIMME software package for the solving large, sparse Hermitian and real symmetric eigenvalue problems. The difficulty and importance of these problems have increased over the years, necessitating the use of preconditioning and near optimally converging iterative methods. On the other hand, the complexity of tuning or even using such methods has kept them outside the re...

متن کامل

Enhancing Working Memory Capacity in Persian Cochlear Implanted Children: A Clinical Trial Study

Introduction: Sensory deprivations such as hearing impairment that affect sensory input have a secondary impact on cognitive functions such as working memory (WM). WM capacity is an important cognitive component that processes language-related activities. Moreover, several studies have shown a deficit in WM in children with a cochlear implant (CI). We aimed to assess the performance of children...

متن کامل

Technical Report WM - CS - 2009 - 07 College of William & Mary Department of Computer Science WM - CS - 2009 - 07 Program Seminal Behaviors : Automating Input Characterization for Large - Scope Proactive Behavior Prediction

Accurately forecasting how a program behaves is crucial for optimizations in compilers, as well as in other layers in the software execution stack, such as operating systems, virtual machines. This fundamental problem has drawn decades of research endeavors, with various techniques produced, ranging from the estimation in static compilers to profilingbased projection to dynamic sampling in runt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010