Impact of memory hierarchy on program partitioning and scheduling

نویسندگان

  • Wesley K. Kaplow
  • William Maniatty
  • Boleslaw K. Szymanski
چکیده

In this paper we present a method for determining the cache performance of the loop nests in a program. The cache-miss data are produced by simulating the loop nest execution on an architecturally parameterized cache simulator. We show that the cache-miss rates are highly non-linear with respect to the ranges of the loops, and correlate well with the performance of the loop nests on actual target machines. The cache-miss ratio is used to guide program optimizations such as loop interchange and iteration-space blocking. It can also be used to provide an estimate for the runtime of a program. Both applications are important in scheduling programs for parallel execution. Presented here are examples of program optimization for several popular processors, such as the IBM 9076 SPl, the SuperSPARC, and the Intel i860.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Power L2 Cache Architecture for Multiprocessor System on Chip Design

Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...

متن کامل

International Journal of Emerging Trends in Engineering and Development Issue 3, Vol.2 (May 2013) Available online on http://www.rspublication.com/ijeted/ijeted_index.htm ISSN 2249-6149

Significant portion of cache energy in a highly associative cache is consumed during tag comparison. In this paper tag comparison is carried out by predicting both cache hit and cache miss using multistep tag comparison method. A partially tagged bloom filter is used for cache miss predictions by checking the non-membership of the addresses and hotline check for cache hit prediction by reducing...

متن کامل

Carrot-hole Data Scheduling and Adaptive Partitioning for Memory Traac Minimization

Massive uniform nested loops are broadly used in scientiic and multi-dimensional Digital Signal Processing applications. Due to the amount of data handled by such applications, cache or on-chip memory are required to improve the data access and overall system performance. Most of existing application speciic systems do not eeciently optimize the access to diierent levels of memory hierarchy. In...

متن کامل

Memory Architectures for NoC-Based Real-Time Mixed Criticality Systems

Mixed criticality systems (MCS) allow software components of differing criticalities to use the same physical resources (ie. CPU, memory). MCS highlight the trade-off between partitioning components of different criticalities and efficient resource usage. Components are partitioned due to safety concerns, but physical partitioning requires more resources than if components are unpartitioned and...

متن کامل

A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

The memory hierarchy in modern computing systems is typically time-shared and space-shared amongst multiple processes and threads, some of which execute simultaneously. Memory contention can signi cantly degrade the performance of running processes. Cache hit counters found in modern microprocessor provide a limited picture as to the memory needs of processes. We propose a low overhead, on-line...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995