Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling
نویسندگان
چکیده
Processing-in-Memory (PIM) has been actively studied to overcome the memory bottleneck by placing computing units near or in memory, especially for efficiently processing low locality data-intensive applications. We can categorize in-DRAM PIMs depending on how many banks perform PIM computation one DRAM command: per-bank and all-bank. The operates only bank, delivering performance but preserving standard interface servicing non-PIM requests during execution. all-bank all banks, achieving high accompanying design issues like thermal power consumption. introduce memory-computation decoupling execution achieve ideal while JEDEC interface, i.e., performing execution, thus easily adapted commercial platforms. divide into two phases: phases. At phase, we read bank-private operands from a bank store them engines’ registers bank-by-bank. decouple engine array broadcast bank-shared operand using read/write command make simultaneously, reaching throughput of PIM. For extending maximizing opportunity, compiler analysis code generation technique identify operands. compared Level-2/3 BLAS, multi-batch LSTM-based Seq2Seq model, BERT our decoupled with In Level-3 achieved speedups 75.8x, 1.2x, 4.7x CPU, GPU, up 91.4% performance. Furthermore, consumed less energy than GPU 72.0% 78.4%, 7.4%, little more
منابع مشابه
The Hierarchical Multi-Bank DRAM: A High-Performance Architecture for Memory Integrated with Processors
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a high performance microprocessor will typically send more accesses than the DRAM can handle due to the long cycle time of the embedded DRAM, especially in applications with significant memory requirements. A multi-bank...
متن کاملDerivation of a DRAM Memory Interface by Sequential Decomposition
Design and synthesis of DRAM based memory systems has been a di cult task in high-level system synthesis because of the relatively complex protocols involved. In this paper, we illustrate a method for topdown design of a DRAM memory interface using a transformational approach. Sequential decomposition of the DRAM memory interface entails extraction of a DRAM memory object from a system descript...
متن کاملEffect of Working Memory Training on the Improving Reading Performance and Working Memory Capacity in Children with Dyslexia
Introduction: In recent years, researchers have focused on students who have challenges in learning, and these problems effect on their educational process. This study aimed to investigate the effect of working memory training programs on the improving reading performance and working memory capacity in children with dyslexia. Method: The research method was quasi-experimental. In this regard 30...
متن کاملMemory Performance among Children with ADHD
Introduction: The present post-eventual research study was conducted with the purpose of comparing the memory performance between two distinct groups of 50 healthy children and 50 attention deficit hyperactivity disorder (ADHD) children (25 girls and 25 boys) in Tehran with an age range of 10-12. Methods: The whole students were selected through simple random sampling method and were assessed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2022
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2022.3203051