Executing Process Networks on Heterogeneous Platforms using OpenCL

نویسندگان

Lars Schor

Andreas Tretter

Lothar Thiele

چکیده

Upcoming heterogeneous systems ask for new programming paradigms. Abstracting the underlying hardware architecture is desirable in order to support productive software development. This thesis proposes a design flow and runtime-system for executing process networks on heterogeneous systems using OpenCL. Process networks are a popular model of computation for deterministic parallel programming and OpenCL is a royalty-free standardised programming interface with a broad support in industry. The proposed design flow consists of a program code synthesis framework for building applications from a generic high-level process network specification. The synthesised application is targeted to a certain OpenCL architecture that was predefined by a high-level specification. The target code is built for this architecture specification integrating reusable building block primitives into it. Those primitives are location-based FIFO channels minimising the number of memory copy operations, a process wrapper that is interconnectable to channels and mirrors the process network functionality, and an extensible task activation framework responsible for interprocess synchronisation. Heterogeneous systems, being inherently parallel computer architectures, demand scalable and parallel applications. To simplify this, a notion of shadow copies was introduced to transparently abstract data-parallelism. Extensive evaluations on two heterogeneous systems have shown that the proposed design flow and runtime-system support a wide range of heterogeneous platforms and parallel applications. Furthermore, the evaluations have proved that the proposed framework is suitable for efficient and productive software development for heterogeneous systems and that it provides enough flexibility so that the programmer can efficiently exploit the parallelism offered by multicore CPUs and GPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In-place Recursive Approach for All-pairs Shortest Paths Problem Using Opencl

The all-pairs shortest paths (APSP) problem finds the shortest path distances between all pairs of vertices,and is one of the most fundamental graph problems. In this paper, a parallel recursive partitioning approach to APSP problem using Open Computing Language (OpenCL) for directed and dense graphs with no negative cyclesbased on R-Kleene algorithm, is presented, which recursively partitions ...

متن کامل

Leveraging Parallelism with CUDA and OpenCL

Graphics processing units (GPUs), originally designed for computing and manipulating pixels, have become general-purpose processors capable of executing in excess of trillion calculations per second. Taking advantage of GPU’s compute power and commodity popularity, the field of computing systems is exhibiting a trend toward heterogeneous platforms consisting of a central processor integrated wi...

متن کامل

Parallel Computing for Accelerated Texture Classification with Local Binary Pattern Descriptors using OpenCL

In this paper, a novel parallelized implementation of rotation invariant texture classification using Heterogeneous Computing Platforms like CPU and Graphics Processing Unit (GPU) is proposed. A complete modeling of the LBP operator as well as its improvised versions of Complete Local Binary Patterns (CLBP) and Multi-scale Local Binary Patterns (MLBP) has been developed on a CPU and GPU based H...

متن کامل

Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

This report introduces hybrid implementation of the Gromacs application, and provides instructions on building and executing on PRACE prototype platforms with Grahpical Processing Units (GPU) and Many Intergrated Cores (MIC) accelerator technologies. GROMACS currently employs message-passing MPI parallelism, multi-threading using OpenMP and contains kernels for non-bonded interactions that are ...

متن کامل

Performance Portability in Accelerated Parallel Kernels

Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Executing Process Networks on Heterogeneous Platforms using OpenCL

نویسندگان

چکیده

منابع مشابه

In-place Recursive Approach for All-pairs Shortest Paths Problem Using Opencl

Leveraging Parallelism with CUDA and OpenCL

Parallel Computing for Accelerated Texture Classification with Local Binary Pattern Descriptors using OpenCL

Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe GROMACS on Hybrid CPU-GPU and CPU-MIC Clusters: Preliminary Porting Experiences, Results and Next Steps

Performance Portability in Accelerated Parallel Kernels

عنوان ژورنال:

اشتراک گذاری