TTL inter-task communication implementation on a shared-memory multiprocessor platform
نویسندگان
چکیده
TTL is an abstract task-level interface which is used both for developing parallel application models and as a platform interface for implementing streaming applications on multi-processor architectures. Inter-task communication (ITC) is defined by TTL. The CAKE platform that we target, consists of homogeneous communicating tiles. Each tile consists of a shared memory with a heterogeneous mix of MIPS and TriMedia processors, DSPs and hardware accelerators. Due to task synchronization and data transfer in shared memory, architectural issues related with cache coherence and memory data copying are investigated. Optimizations such as padding-array insertion, prefetching and postflushing techniques are suggested. Alternative implemenations with semaphore and index/pointer as synchronization construct are explained. Prototype design and cycle true simulations with the Producer-Consumer model and the JPEG decoder application demonstrate that: compared to some old initial implementation, we can achieve almost 80% improvement on Cycles Per Token transfer (CPT), and reduce the total cycles of running the JPEG decoder application by 23%. Keywords—inter-task communication; synchronization; shared-memory multiprocessor; semaphore; cache coherence
منابع مشابه
Hybrid Vs Memory-to-memory Communication in Multi-core Processor
Now a day’s multi-core architecture introduces new challenges for effective implementation of inter-core communication, as Inter-core communication plays an important role to balance the delay in a multicore processor. The two mechanisms used for inter-core communication are sharedmemory and message-passing communications. Shared-memory communication fails to provide sufficient scalability with...
متن کاملScalable Inter-Cluster Communication Systems for Clustered Multiprocessors
As workstation clusters move away from uniprocessors in favor of multiprocessors to support the increasing computational needs of distributed applications, greater demands are placed on the communication interfaces that couple individual workstations. This paper investigates scalable, e cient, and reliable communication systems for multiprocessor clusters that use commodity local area networks ...
متن کاملPerformance Prediction and Evaluation of Parallel Processing on a NUMA Multiprocessor
Non-Uniform Memory Access (NUMA) architectures make it possible to build large-scale shared memory multiprocessor systems in comparison with non-scalable UniformMemory Access (UMA) architectures. Most NUMA multiprocessor operations such as scheduling and synchronizing processes, accessing data from processors to memory models and allocating distributed memory space to di erent processors, are p...
متن کاملAn Asynchronous Model of Global Parallel Genetic Algorithms
Genetic algorithms usually require more computation power than other heuristic approaches do. In this paper we introduce an efficient implementation of asynchronously global parallel genetic algorithm with 3-tournament elimination selection. The parallelization of the algorithm is achieved through multithreading mechanism, a very effective and easy to implement technique. With parallelization w...
متن کاملKernel-Kernel Communication in a Shared-Memory Multiprocessor t
In the standard kernel organization on a shared-memory multiprocessor all processors share the code and data of the operating system; explicit synchronization is used to control access to kernel data structures. Distributed-memory multicomputers use an alternative approach, in which each instance of the kernel performs local operations directly and uses remote invocation to perform remote opera...
متن کامل