Energy-Efficient Stream Compaction Through Filtering and Coalescing Accesses in GPGPU Memory Partitions

نویسندگان

چکیده

Graph-based applications are essential in emerging domains such as data analytics or machine learning. Data gathering a knowledge-based society requires great processing efficiency. High-throughput GPGPU architectures key to enable efficient graph processing. Nonetheless, irregular and sparse memory access patterns present graph-based induce high divergence contention, which result poor efficiency for Recent work has pointed out the importance of stream compaction operations, proposed Stream Compaction Unit (SCU) offload them specialized hardware. On other hand, contention caused by been tackled with Irregular accesses Reorder (IRU), delivering improved coalescing. In this paper, we propose new unit, IRU-enhanced SCU (ISCU), that leverages strengths both approaches. The ISCU employs mechanisms IRU improve throughput limitations, achieving synergistic effect We evaluate wide variety state-of-the-art algorithms applications. Results show achieves performance speedup 2.2x 90 percent energy savings derived from reduction 78 accesses, while incurring 8.5 area overhead.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoc: GPGPU Programming through Stream Processing with OCaml

ions Skeletons and Composition : Tomorrow 4:30pm OpenGPU workshop DSL Embedded language to express kernel Real World Use Case 2DRMP : Dimensional R-matrix propagation (Computer Physics Communications) Simulates electron scattering from H-like atoms and ions at intermediate energies Multi-Architecture: MultiCore, GPGPU, Clusters, GPU Clusters Translate from Fortran + Cuda to OCaml+SPOC + Cuda/Op...

متن کامل

Efficient Optimization of Memory Accesses in Parallel Programs

متن کامل

k-Efficient partitions of graphs

A set $S = {u_1,u_2, ldots, u_t}$ of vertices of $G$ is an efficientdominating set if every vertex of $G$ is dominated exactly once by thevertices of $S$. Letting $U_i$ denote the set of vertices dominated by $u_i$%, we note that ${U_1, U_2, ldots U_t}$ is a partition of the vertex setof $G$ and that each $U_i$ contains the vertex $u_i$ and all the vertices atdistance~1 from it in $G$. In this ...

متن کامل

Fast and energy-frugal deterministic test through efficient compression and compaction techniques

Conversion of the flip-flops of the circuit into scan cells helps ease the test challenge; yet test application time is increased as serial shift operations are employed. Furthermore, the transitions that occur in the scan chains during these shifts reflect into significant levels of circuit switching unnecessarily, increasing the power dissipated. Judicious encoding of the correlation among th...

متن کامل

Formalizing Memory Accesses and Interrupts

The hardware/software boundary in modern heterogeneous multicore computers is increasingly complex, and diverse across different platforms. A single memory access by a core or DMA engine traverses multiple hardware translation and caching steps, and the destination memory cell or register often appears at different physical addresses for different cores. Interrupts pass through a complex topolo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Computers

سال: 2022

ISSN: ['1557-9956', '2326-3814', '0018-9340']

DOI: https://doi.org/10.1109/tc.2021.3104749