A Study of Leveraging Memory Level Parallelism for DRAM System on Multi-Core Architecture

نویسندگان

  • Yuxuan Wang
  • Yingping Zhang
  • Xiaotian Zhang
  • Jian Yin
  • Licheng Chen
چکیده

DRAM system has been more and more critical on modern multi-core architecture where the Moore’s law has been made effect on increasing the number of cores integrated in a processor chip. The performance of DRAM system is usually measured in term of bandwidth and latency, which are regarded as inherently depending on Row Buffer Hit Rate (RBHR) according to previous studies. In this paper, we find that Memory Level Parallelism (MLP) exhibits a stronger correlation with the performance of DRAM system on multi-core/many-core architecture than RBHR, and promoting MLP significantly improves DRAM system performance. In order to exploit the MLP, we have evaluated various approaches including multi-bank, multi-row-buffers, multi-memory-controllers and the obsolete Virtual Channel Memory (VCM). The experimental results show that VCM is a better alternative to traditional DRAM chip on multicore/many-core architecture than the other three approaches because VCM has almost all the advantages of the others: 1) it can improve homogeneous workloads’ IPC by 2.21X on a 16-core system with 32 virtual channels due to leveraging unexploited MLP. 2) It can also promote Quality-of-Service (QoS) of DRAM system by removing unfairness while memory controllers serve memory requests. 3) It can save energy and has low area costs. Unfortunately, VCM, which was proposed in the late 1990s, faded away before multicore/manycore became dominated. Therefore, we suggest memory chip vendors reconsider the VCM technology for multi-core architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable and Flexible heterogeneous multi-core system

Multi-core system has wide utility in today’s applications due to less power consumption and high performance. Many researchers are aiming at improving the performance of these systems by providing flexible multi-core architecture. Flexibility in the multi-core processors system provides high throughput for uniform parallel applications as well as high performance for more general work. This fl...

متن کامل

DRAM-Aware Last-Level Cache Replacement

The cost of last-level cache misses and evictions depend significantly on three major performance-related characteristics of DRAM-based main memory systems: bank-level parallelism, row buffer locality, and write-caused interference. Bank-level parallelism and row buffer locality introduce different latency costs for the processor to service misses: parallel or serial, fast or slow. Write-caused...

متن کامل

Evaluating Row Buffer Locality in Future Non-Volatile Main Memories

DRAM-based main memories have read operations that destroy the read data, and as a result, must buffer large amounts of data on each array access to keep chip costs low. Unfortunately, system-level trends such as increased memory contention in multi-core architectures and data mapping schemes that improve memory parallelism may cause only a small amount of the buffered data to be accessed. This...

متن کامل

The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors

A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank...

متن کامل

An Effective DRAM Cache Architecture for Scale-Out Servers

Scale-out workloads are characterized by in-memory datasets, and consequently massive memory footprints. Due to the abundance of request-level parallelism found in these workloads, recent research advocates for manycore architectures to maximize throughput while maintaining quality of service. On-die stacked DRAM caches have been proposed to provide the required bandwidth for manycore servers t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016