Characterizing the Memory Behavior of Compiler-Parallelized Applications
نویسندگان
چکیده
Compiler-parallelized applications are increasing in importance as moderate-scale multiproces-sors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler par-allelization. Using full-sized input data sets and applications taken from standard benchmark suites, we measure statistics such as speedups, synchronization and load imbalance, causes of cache misses, cache line utilization, data traac and memory costs. This exploration allows us to draw several conclusions. First, we nd that larger granularity parallelism often correlates with good memory system behavior, good overall performance, and high speedup in these applications. Second, we show that when long (512 byte) cache lines are used, many of these applications suuer from false sharing and low cache line utilization. Third, we identify some of the common artifacts in compiler-parallelized codes that can lead to false sharing or other types of poor memory system performance, and we suggest methods for improving them. Overall, this study ooers both an important snapshot of the behavior of applications compiled by current parallelizing compilers, as well as an increased understanding of the interplay between cache line size, program granularity, and memory performance in moderate-scale multiprocessors.
منابع مشابه
Evaluating the impact of advanced memory systems on compiler-parallelized codes
Compiler-parallelized applications are increasing in importance as moderate-scale multiprocessors become common. This paper evaluates how features of advanced memory systems (e.g., longer cache lines) impact memory system behavior for applications amenable to compiler parallelization. Using full-sized input data sets and applications taken from the SPEC, NAS, PERFECT, and RICEPS benchmark suite...
متن کاملAn Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications an Integrated Runtime and Compile-time Approach for Parallelizing Structured and Block Structured Applications
Scientiic and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). In this paper, we present a combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an eecient and machine-independe...
متن کاملTWO BLOCKS : 49 X 9 X 9 Mesh
Scienti c and engineering applications often involve structured meshes. These meshes may be nested (for multigrid or adaptive codes) and/or irregularly coupled (called Irregularly Coupled Regular Meshes). We have designed and implemented a runtime library for parallelizing this general class of applications on distributed memory parallel machines in an e cient and machine independent manner. In...
متن کاملPerformance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple mod...
متن کاملUsing Locality Information in Userlevel Scheduling TR-95-14
In the past few years, MIMD parallel computers have become important not only in the eld of high performance scienti c computing, but also as ordinary compute servers. Applications that can not be parallelized by an appropriate compiler get more and more parallelized by use of the threads programming model. Especially for machines having large caches and/or non uniform memory access special car...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Parallel Distrib. Syst.
دوره 7 شماره
صفحات -
تاریخ انتشار 1996