Implementing Performance Libraries on Graphics Hardware

نویسندگان

  • MATTHEW M TRENTACOSTE
  • Doug James
  • Kayvon Fatahalian
  • John Ketch
چکیده

We propose a simple method to implement floating-point vector math operations and matrix multiplication on graphics hardware, focusing on identification of details, in both software and hardware, which affect performance and ease of use. Before widespread adoption of the graphics processing unit (GPU) as another computation processor, we must address the need of application interfaces (APIs) that abstract away the details of the implementation. We focus on providing an interface to the hardware that utilizes high level interfaces that hide the specifics of implementing the functionality on the GPU, while maintaining performance. We then use this interface to implement non-negative matrix factorization, used for performing feature extraction, to demonstrate the strengths of the library when run on current graphics hardware.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)

Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...

متن کامل

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite increasing hardware flexibility and software programming toolchain maturity, high efficiency GPU programming remains difficult: it suffers from high complexity, lo...

متن کامل

Scalable Rendering on PC Clusters

This paper presents initial results from research targeted at the development of cost-effective scalable visualization and rendering technologies. The implementations of two 3D graphics libraries based on the popular sort-last and sort-first parallel rendering techniques are discussed. An important goal of these implementations is to provide scalable rendering capability for extremely large dat...

متن کامل

Automatic Tuning Matrix Multiplication Performance on Graphics Hardware By

Graphics hardware’s performance is advancing much faster than the performance of conventional microprocessor. In order to utilize the tremendous computing power of these systems, it is critical to tune software to graphics hardware’s architectural features. The frequent changes in GPUs’ architecture and performance characteristics makes it very desirable for such tuning to be automated. This pa...

متن کامل

OpenCL Evaluation for Numerical Linear Algebra Library Development

With the help of of CUDA [7], [6], many applications improved their performance by using GPUs. In our project called Matrix Algebra on GPU and Multicore Architectures (MAGMA) [10], we mainly focus on dense linear algebra routines similar to those from LAPACK [1]. Other than CUDA, there exist other frameworks that allow platformindependent programming for GPUs. The main three frameworks are: 1) ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003