Area and Performance Optimization of Barrier Synchronization on Multi-core Network-on-Chips
نویسندگان
چکیده
Barrier synchronization is commonly and widely used to synchronize the execution of parallel processor cores on multi-core Network-on-Chips (NoCs). Since its global nature may cause heavy serialization resulting in large performance penalty, barrier synchronization should be carefully designed to have low latency communication and to minimize overall completion time. Therefore, in the paper, we propose a fast barrier synchronization mechanism, targeting Multi-core NoCs. The fast barrier synchronization mechanism includes a dedicated hardware module, named Fast Barrier Synchronizer (FBS), integrated with each processor node. It offers a set of barrier counters and can concurrently process synchronization requests issued by the local node and remote nodes via the on-chip network. The salient feature of our fast barrier synchronization mechanism is that, once the barrier condition is reached, the “barrier release” acknowledgement is routed to all processor nodes in a broadcast way in order to save chip area by avoiding storing source node information and to minimize completion time by avoiding serialization of barrier releasing. Synthesis results suggest that the FBS can run over 1 GHz in SMIC 130nm technology with small area overhead. We implemented a FBS-enhanced multi-core NoC architecture on our FPGA platform using the Xilinx Virtex 5 as the FPGA chip. FPGA utilization and simulation results show that our fast barrier synchronization demonstrates both area and performance advantages over the barrier synchronization counterpart with unicast barrier releasing. Keywords-Barrier Synchronization; Multi-core; Network-on-
منابع مشابه
Robust Control Synchronization on Multi-Story Structure under Earthquake Loads and Random Forces using H∞ Algorithm
In this paper, the concept of synchronization control along with robust H∞ control are considered to evaluate the seismic response control on multi-story structures. To show the accuracy of the novel algorithm, a five-story structure is evaluated under the EL-Centro earthquake load. In order to find the performance of the novel algorithm, random and uncertainty processes corresponding...
متن کاملImproving Performance of Collection - Oriented Operations through Parallel Fusion
To more fully utilize the potential offered by multi-core processors, programming languages must have features for expressing parallelism. One promising approach is collection-oriented operations, which are easily expressed by the programmer and can be implemented by the runtime system in a parallel fashion for improved performance. However, the ordinary implementation requires a barrier synchr...
متن کاملApplication Mapping onto Network-on-Chip using Bypass Channel
Increasing the number of cores integrated on a chip and the problems of system on chips caused to emerge networks on chips. NoCs have features such as scalability and high performance. NoCs architecture provides communication infrastructure and in this way, the blocks were produced that their communication with each other made NoC. Due to increasing number of cores, the placement of the cores i...
متن کاملOperator Fusion in a Data Parallel Library
To more fully utilize the potential offered by multi-core processors, programming languages must have features for expressing parallelism. One promising approach is collection-oriented operations, which are easily expressed by the programmer and can be implemented by the runtime system in a parallel fashion for improved performance. However, the ordinary implementation requires a barrier synchr...
متن کاملHeterogeneous Networks of Workstations across Wide Area Networks Be Accepted in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Computer Engineering
Networks made up of various systems are scattered across wide area networks and together they contribute to the heterogeneous environment of the computational grid. Whilst they are an immense source of computing resource, the core weakness of connecting these networks blindly together is that they are made up of various network link speeds. Bottlenecks in communications occur due to the varied ...
متن کامل