Automated Parallelization of Non-uniform Convolutions on Chip Multiprocessors
نویسندگان
چکیده
This paper introduces an approach for automatic parallelization of unequally-spaced convolutions on chip multiprocessors (CMPs). CMPs are very promising candidates for digital processing in signal and image systems with high throughput and low power consumption, compared to uniprocessor based architectures. As CMPs are emerging and evolving in increasing diversity and complexity, automated parallelization of digital signal processing (DSP) algorithms on CMPs becomes essential in embracing, employing and utilizing these architectures in high-performance signal and image systems.
منابع مشابه
Utilization of Cache Area in On-Chip Multiprocessor
On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the oo-chip and the on-chip memory access latencies is higher th...
متن کاملPerformance Analysis of Non-Uniform Cache Architecture Policies for Chip-Multiprocessors Using the Parsec v2.0 Benchmark Suite
Non-Uniform Cache Architectures (NUCA) have been proposed as a solution to overcome wire delays that will dominate on-chip latencies in Chip Multiprocessor designs in the near future. This novel means of organization divides the total memory area into a set of banks that provides non-uniform access latencies and thus faster access to those banks that are close to the processor. A NUCA model can...
متن کاملPii: S0141-9331(00)00094-6
On-chip multiprocessor can be an alternative to the wide-issue superscalar processor approach which is currently the mainstream to exploit the increasing number of transistors on a silicon chip. Utilization of the cache, especially for the remote data is important in the system using such on-chip multiprocessors since the ratio of the off-chip and the on-chip memory access latencies is higher t...
متن کاملAutomatic Parallelization for Non-cache Coherent Multiprocessors
Although much work has been done on parallelizing compilers for cache coherent shared memory multiprocessors and message-passing multiprocessors, there is relatively little research on parallelizing compilers for noncache coherent multiprocessors with global address space. In this paper, we present a preliminary study on automatic parallelization for the Cray T3D, a commercial scalable machine ...
متن کاملThe Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization
As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve the full potential of these “single-chip multiprocessors,” however, we must find a way to parallelize non-numeric applications. Unfortunately, compilers have had little success in parallelizing non-numeric codes due to ...
متن کامل