Compiler-assisted Hybrid Operand Communication
نویسندگان
چکیده
Communication of operands among in-flight instructions can be power intensive, especially in superscalar processors where all result tags are broadcast to a small number of consumers through a multi-entry CAM. Token-based point-to-point communication of operands in dataflow architectures is highly efficient when each produced token has only one consumer, but inefficient when there are many consumers due to the construction of software fanout trees. Placing operands in registers is efficient for broadcasting the values which have consumers spread over a long lifetime, but inefficient for shorter-lived operations. This paper evaluates a compilerassisted hybrid instruction communication model that combine tokens instruction communication with statically assigned broadcast tags. Each fixed-size block of code is given a small number of architectural broadcast identifiers, which the compiler can assign to producers that have many consumers. Producers with few consumers rely on point-to-point communication through tokens. Producers whose result is live past the instruction block communicate with distant consumers through a register. Selecting the mechanism statically by the compiler relieves the hardware from categorizing instructions at runtime. At the same time, a compiler can categorize instructions better than dynamic selection does because the compiler analyzes a larger range of instructions. Furthermore, compiler could perform complex optimizations without hardware cost and execution-time penalty. We propose a compiler optimization to reuse broadcast tags for instructions with non-overlapping broadcast live ranges, the speedup is further improved without spending more power . The results show that this compiler-assisted hybrid token/broadcast model requires only eight architectural broadcasts per block, enabling highly efficient CAMs. This hybrid model reduces instruction communication energy by 28% compared to a strictly token-based dataflow model (and by over 2.7X compared to a hybrid model without compiler support), while simultaneously increasing performance by 8% on average across the SPECINT and EEMBC benchmarks, running as single threads on 16 composed, dual-issue EDGE cores.
منابع مشابه
Compiler-assisted multiple instruction rollback recovery using a read buffer - Computers, IEEE Transactions on
Abstrucf-Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardwarebased MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compiler-based MIR designs have also been developed which remove rollback data hazards directly with data-flow tran...
متن کاملCompiler-Assisted Multiple Instruction Rollback Recovery Using a Read Buffer
Multiple instruction rollback (MIR) is a technique that has been implemented in mainframe computers to provide rapid recovery from transient processor failures. Hardware-based MIR designs eliminate rollback data hazards by providing data redundancy implemented in hardware. Compilerbased MIR designshave also been developed which remove rollbackdata hazards directlywith data-flowtransformations. ...
متن کاملCapacity Enhancement in Hybrid Wireless Relay Network with Network Coding
Network coding technique increases wireless network communication efficiency. Wireless multihop relay network has been shown to achieve capacity gain over conventional single-hop wireless networks. Hybrid wireless relay networks integrate multihop ad hoc relay and infrastructure base stations to achieve better wireless network performance. Applying the promising network coding technique to hybr...
متن کاملCompiler assisted Data Forwarding in VLIW/EPIC architectures
This paper proposes a mechanism for reducing the complexity of forwarding hardware in VLIW/EPIC processors. The necessary information for data forwarding is known at compile time. This paper proposes a way to incorporate the forwarding information along with the instruction itself, thereby reducing the hardware complexity of forwarding logic with implications for power saving and reducing chip ...
متن کاملIntegrating Fine-Grained Message Passing in Cache Coherent Shared Memory Multiprocessors
This paper considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency caused by interprocessor communication in cache coherent, shared memory multiprocessors. Data prefetching is accomplished by using a multiprocessor software pipelined algorithm. Data forwarding is used to target interprocessor data communication, rather than synchronizatio...
متن کامل