Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs
نویسندگان
چکیده
The UTS benchmark is used to evaluate task parallelism in OpenMP 3.0 as implemented in a number of recently released compilers and run-time systems. UTS performs parallel search of an irregular and unpredictable search space, as arises e.g. in combinatorial optimization problems. As such UTS presents a highly unbalanced task graph that challenges scheduling, load balancing, termination detection, and task coarsening strategies. Scalability and overheads are compared for OpenMP 3.0, Cilk, and an OpenMP implementation of the benchmark without tasks that performs all scheduling, load balancing, and termination detection explicitly. Current OpenMP 3.0 implementations generally exhibit poor behavior on the UTS benchmark.
منابع مشابه
Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Current generation of high performance computing platforms tends to hold a large number of cores. Therefore applications have to expose a fine-grain parallelism to be more efficient. Since version 3.0, the OpenMP standard proposes a way to express such parallelism through tasks. Because the task scheduling strategy is implementation defined, each runtime can have a different behavior and effici...
متن کاملOpenMP task scheduling strategies for multicore NUMA systems
The recent addition of task parallelism to the OpenMP shared memory API allows programmers to express concurrency at a high level of abstraction and places the burden of scheduling parallel execution on the OpenMP run time system. Efficient scheduling of tasks on modern multi-socket multicore shared memory systems requires careful consideration of an increasingly complex memory hierarchy, inclu...
متن کاملSupport for Fine Grained Dependent Tasks in OpenMP
OpenMP is widely used for shared memory parallel programming and is especially useful for the parallelisation of loops. When it comes to task parallelism, however, OpenMP is less powerful and the sections construct lacks support for dependences and fine grained tasks. This paper proposes a new work-sharing construct, tasks, which is a generalisation of sections. It goes beyond sections by allow...
متن کاملEvaluating OpenMP Tasking at Scale for the Computation of Graph Hyperbolicity
We describe using OpenMP to compute δ-hyperbolicity, a quantity of interest in social and information network analysis, at a scale that uses up to 1000 threads. By considering both OpenMP workshare and tasking models to parallelize the computations, we find that multiple task levels permits finer grained tasks at runtime and results in better performance at scale than worksharing constructs. We...
متن کاملPerformance Monitoring and Analysis of Task-Based OpenMP
OpenMP, a typical shared memory programming paradigm, has been extensively applied in high performance computing community due to the popularity of multicore architectures in recent years. The most significant feature of the OpenMP 3.0 specification is the introduction of the task constructs to express parallelism at a much finer level of detail. This feature, however, has posed new challenges ...
متن کامل