Investigating memory prefetcher performance over parallel applications: From real to simulated

نویسندگان

چکیده

Memory prefetcher algorithms are widely used in processors to mitigate the performance gap between and memory subsystem. The complexities behind architectures algorithms, however, not only hinder development of accurate architecture simulators, but also understanding prefetcher's contribution performance, on both a real hardware simulated environment. In this paper, we contribute shed light role parallel High-Performance Computing applications, considering offered by simulators. We performed careful experimental investigation, executing NAS benchmark (NPB) Skylake machine, as well environment with ZSim Sniper taking into account Our results show that: (i) prefetching from L3 L2 cache presents better gains, (ii) contention execution constrains effect, (iii) Skylake's is poorly Sniper, (iv) noninclusive hinders simulation NPB Sniper's prefetchers.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Analysis of Shared-Memory Parallel Applications Using Performance Properties

Tuning parallel code can be a time-consuming and difficult task. We present our approach to automate the performance analysis of OpenMP applications that is based on the notion of performance properties. Properties are formally specified in the APART specification language (ASL) with respect to a specific data model. We describe a data model for summary (profiling) data of OpenMP applications a...

متن کامل

Performance Analysis of Parallel Java Applications on Shared-memory Systems

In this paper we describe an instrumentation environment for the performance analysis and visualization of parallel applications written in JOMP, an OpenMP-like interface for Java. The environment includes two complementary approaches. The first one has been designed to provide a detailed analysis of the parallel behavior at the JOMP programming model level. At this level, the user is faced wit...

متن کامل

Performance Analysis of Multilevel Parallel Applications on Shared Memory Architectures

In this paper we describe how to apply powerful performance analysis techniques to understand the behavior of multilevel parallel applications. We use the Paraver/OMPItrace performance analysis system for our study. This system consists of two major components: The OMPItrace dynamic instrumentation mechanism, which allows the tracing of processes and threads and the Paraver graphical user inter...

متن کامل

Protecting Memory-Performance Critical Sections in Soft Real-Time Applications

Soft real-time applications such as multimedia applications often show bursty memory access patterns—regularly requiring a high memory bandwidth for a short duration of time. Such a period is often critical for timely data processing. Hence, we call it a memory-performance critical section. Unfortunately, in multicore architecture, non-real-time applications on different cores may also demand h...

متن کامل

Hierarchical Parallel Simulated Annealing and Its Applications

In this paper we propose a new parallelization scheme for Simulated Annealing — Hierarchical Parallel SA (HPSA). This new scheme features coarse-granularity in parallelization, directed at message-passing systems such as clusters. It combines heuristics such as adaptive clustering with SA to achieve more efficiency in local search. Through experiments with various optimization problems and comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Concurrency and Computation: Practice and Experience

سال: 2021

ISSN: ['1532-0634', '1532-0626']

DOI: https://doi.org/10.1002/cpe.6207