Explore Be-Nice Instruction Scheduling in Open64 for an Embedded SMT Processor
نویسندگان
چکیده
A SMT processor can fetch and issue instructions from multiple independent hardware threads at every CPU cycle. Therefore, hardware resources are shared among the concurrently-running threads at a very fine grain level, which can increase the utilization of processor pipeline. However, the concurrently-running threads in a SMT processor may interfere with each other and stall the CPU pipeline. We call this kind of pipeline stall inter-thread stall (ITS for short) or thread interlock. In this paper, we present our study on the ITS problem on an embedded heterogeneous SMT processor. Our experiments demonstrate that, for some test cases, 50% of the total pipeline stalls are caused by ITS. Therefore, we have developed a new instruction scheduling algorithm called be-nice instruction scheduling, based on Open64 Global Code Motion, to coordinate the conflicts between concurrent threads. The instruction scheduler uses the thread interference information (obtained by profiling) as heuristics to decrease the number of ITS without sacrificing the overall CPU performance. The experimental results show that, for our current test cases the be-nice instruction scheduler can reduce 15% of the inter-thread stall cycles, and increase the IPC of the critical thread by 2%-3%. The experiments are performed using the Open64 compiler infrastructure.
منابع مشابه
Soft Real- Time Scheduling on Simultaneous Multithreaded Processors
Simultaneous multithreading (SMT) improves processor throughput by processing instructions from multiple threads each cycle. This is the first work to explore soft real-time scheduling on an SMT processor. Scheduling with SMT requires two decisions: (1) which threads to run simultaneously (the co-schedule), and (2) how to share processor resources among co-scheduled threads. We explore algorith...
متن کاملPetri Net Analysis of Non-Redundant and Redundant Execution Schemes
The quest for high-performance has led to multiand many-core systems. To push the performance of a single core to the limit, simultaneous multithreading (SMT) is used. SMT enables to fetch different instructions from different threads, hiding latencies in other threads. SMT also gives the opportunity to execute redundant threads (redundant multithreading, RMT) and thus to detect faults by compa...
متن کاملUnderstanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors
Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of resources in modern super-scalar processors. SMT processors increase instruction-level parallelism (ILP) and resource utilization by simultaneously executing instructions from multiple independent threads. Although simultaneously sharing resources benefits system throughput, coscheduled threads oft...
متن کاملEnergy Efficient Dual Issue Embedded Processor
While energy efficiency is essential to extend the battery life of embedded devices, performance cannot be ignored. High performance superscalar embedded processors are more energy efficient than low performance scalar processors, however, they consume more power which is very limited in battery operated deeply embedded industrial devices. In this paper we propose an energy efficient dual issue...
متن کاملCASH: Revisiting Hardware Sharing in Single-Chip Parallel Processors
As the increasing of issue width has diminishing returns with superscalar processor, thread parallelism with a single chip is becoming a reality. In the past few years, both SMT (Simultaneous MultiThreading) and CMP (Chip MultiProcessor) approaches were first investigated by academics and are now implemented by the industry. In some sense, CMP and SMT represent two extreme design points. In thi...
متن کامل