MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures
نویسندگان
چکیده
منابع مشابه
Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures
This paper presents a new parallelization method for an ef-cient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized with...
متن کاملCurrent Architectures for Parallel Processing
The purpose of this tutorial is to provide the basic concepts related to parallel processing. We will first discuss the fundamentals of parallel machine architectures and parallel programming, including a short view of Flynn's taxonomy with some implementation examples. An overview of some programming techniques supported in parallel environments follows, namely MPI-2 and OpenMP. Finally, a pre...
متن کاملMemory-Efficient and High-Performance Parallel-Pipelined Architectures for 5/3 Forward and Inverse Discrete Wavelet Transform
In this paper, high-efficient lifting-based architectures for the 5/3 forward and inverse discrete wavelet transform (DWT) are proposed. The proposed parallel and pipelined architecture consists of a horizontal filter (HF) and a vertical filter (VF). The system delays of the proposed architectures are reduced. Filter coefficients of the biorthogonal 5/3 wavelet low-pass filter are quantized bef...
متن کاملCache write generate for parallel image processing on shared memory architectures
We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.
متن کاملIntegrated Memory/Network Architectures for Cluster-Organized, Parallel DSP Architectures
The capabilities of switched networks for parallel and distributed computers are evolving rapidly towards networks with various forms of intelligence in support of parallel execution of programs. This paper presents a perspective on intelligent networks, including reconfiguration of the network to adapt to the needs of successive computational algorithms being performed as part of an overall pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Very Large Scale Integration (VLSI) Systems
سال: 2020
ISSN: 1063-8210,1557-9999
DOI: 10.1109/tvlsi.2019.2953171