MERIT: Tensor Transform for Memory-Efficient Vision Processing on Parallel Architectures

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures

This paper presents a new parallelization method for an ef-cient implementation of unstructured array reductions on shared memory parallel machines with OpenMP. This method is strongly related to parallelization techniques for irregular reductions on distributed memory machines as employed in the context of High Performance Fortran. By exploiting data locality, synchronization is minimized with...

متن کامل

Current Architectures for Parallel Processing

The purpose of this tutorial is to provide the basic concepts related to parallel processing. We will first discuss the fundamentals of parallel machine architectures and parallel programming, including a short view of Flynn's taxonomy with some implementation examples. An overview of some programming techniques supported in parallel environments follows, namely MPI-2 and OpenMP. Finally, a pre...

متن کامل

Memory-Efficient and High-Performance Parallel-Pipelined Architectures for 5/3 Forward and Inverse Discrete Wavelet Transform

In this paper, high-efficient lifting-based architectures for the 5/3 forward and inverse discrete wavelet transform (DWT) are proposed. The proposed parallel and pipelined architecture consists of a horizontal filter (HF) and a vertical filter (VF). The system delays of the proposed architectures are reduced. Filter coefficients of the biorthogonal 5/3 wavelet low-pass filter are quantized bef...

متن کامل

Cache write generate for parallel image processing on shared memory architectures

We investigate cache write generate, our cache mode invention. We demonstrate that for parallel image processing applications, the new mode improves main memory bandwidth, CPU efficiency, cache hits, and cache latency. We use register level simulations validated by the UW-Proteus system. Many memory, cache, and processor configurations are evaluated.

متن کامل

Integrated Memory/Network Architectures for Cluster-Organized, Parallel DSP Architectures

The capabilities of switched networks for parallel and distributed computers are evolving rapidly towards networks with various forms of intelligence in support of parallel execution of programs. This paper presents a perspective on intelligent networks, including reconfiguration of the network to adapt to the needs of successive computational algorithms being performed as part of an overall pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Very Large Scale Integration (VLSI) Systems

سال: 2020

ISSN: 1063-8210,1557-9999

DOI: 10.1109/tvlsi.2019.2953171