Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

نویسندگان

چکیده

Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality data-intensive applications. We can categorize in-DRAM PIMs depending on how many banks perform PIM computation one DRAM command: per-bank and all-bank. The operates only bank, delivering performance but preserving standard interface servicing non-PIM requests during execution. all-bank all banks, achieving high accompanying design issues like thermal power consumption. introduce memory-computation decoupling execution achieve ideal while JEDEC interface, i.e., performing execution, thus easily adapted commercial platforms. divide into two phases: phases. At phase, we read bank-private operands from a bank store them engines’ registers bank-by-bank. decouple engine array broadcast bank-shared operand using read/write command make simultaneously, reaching throughput of PIM. For extending maximizing opportunity, compiler analysis code generation technique identify operands. compared Level-2/3 BLAS, multi-batch LSTM-based Seq2Seq model, BERT our decoupled with In Level-3 achieved speedups 75.8x, 1.2x, 4.7x CPU, GPU, up 91.4% performance. Furthermore, consumed less energy than GPU 72.0% 78.4%, 7.4%, little more

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors

A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank...

متن کامل

Derivation of a DRAM Memory Interface by Sequential Decomposition

Design and synthesis of DRAM based memory systems has been a di cult task in high-level system synthesis because of the relatively complex protocols involved. In this paper, we illustrate a method for topdown design of a DRAM memory interface using a transformational approach. Sequential decomposition of the DRAM memory interface entails extraction of a DRAM memory object from a system descript...

متن کامل

Effect of Working Memory Training on the Improving Reading Performance and Working Memory Capacity in Children with Dyslexia

Introduction: In recent years, researchers have focused on students who have challenges in learning, and these problems effect on their educational process. This study aimed to investigate the effect of working memory training programs on the improving reading performance and working memory capacity in children with dyslexia. Method: The research method was quasi-experimental. In this regard 30...

متن کامل

Memory Performance among Children with ADHD

 Introduction: The present post-eventual research study was conducted with the purpose of comparing the memory performance between two distinct groups of 50 healthy children and 50 attention deficit hyperactivity disorder (ADHD) children (25 girls and 25 boys) in Tehran with an age range of 10-12. Methods: The whole students were selected through simple random sampling method and were assessed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2022

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2022.3203051