Performance Benchmarking Locality Aware Runtime for NUMA Architecture
نویسنده
چکیده
Non-Uniform Memory Access architectures introduce a new level of difficulty to programmers. Without the knowledge of the underlying runtime the performance of programs can suffer because they are unaware of the difference in memory latencies in NUMA systems. We seek to alleviate this issue by implementing a runtime that schedules computations on individual NUMA nodes, hopefully countering this problem. This study specifically uses the systematic approach to performance evaluation and a full factorial experimental design. We implemented our solution within the Cilk runtime and performed our experiments on a 4 socket NUMA machine.
منابع مشابه
Topology-Aware Parallelism for NUMA Copying Collectors
NUMA-aware parallel algorithms in runtime systems attempt to improve locality by allocating memory from local NUMA nodes. Researchers have suggested that the garbage collector should profile memory access patterns or use object locality heuristics to determine the target NUMA node before moving an object. However, these solutions are costly when applied to every live object in the reference gra...
متن کاملCharacterization of Locality Aware Task Scheduling Mechanism
The architectural features of modern computers highlight the need of parallel programming for sustained performance. This paper deals with task based programming to program modern computers. Due to lack of data locality, communication optimization and lack of task characterization support in an existing task scheduling, we intends to overview the characterization of locality aware task scheduli...
متن کاملAn Analysis of Shared Library Performance on NUMA Architectures
Most modern multicore systems these days are Non Uniform Memory Architecture (NUMA), which means they have multiple memory controllers with non-uniform access latencies across them. There has been significant amount of work done in exploring and mitigating performance penalty due to NUMA overhead. In previous works, NUMA-aware schedulers were proposed, sometimes with an objective of keeping dat...
متن کاملImproving Parallel System Performance with a NUMA-aware Load Balancer
Multi-core nodes with Non-Uniform Memory Access (NUMA) are now a common architecture for high performance computing. On such NUMA nodes, the shared memory is physically distributed into memory banks connected by a network. Owing to this, memory access costs may vary depending on the distance between the processing unit and the memory bank. Therefore, a key element in improving the performance o...
متن کاملA Transparent Runtime Data Distribution Engine for OpenMP
This paper makes two important contributions. First, the paper investigates the performance implications of data placement in OpenMP programs running on modern NUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of contempo...
متن کامل