Piranha : Exploiting Single - Chip Multiprocessing
نویسندگان
چکیده
Computer parently reordering instructions from nearby program regions, but even sophisticated compiler scheduling is fundamentally limited by the compiler’s inability to perfectly determine the programmer’s intent and its commitment to preserve the program’s high-level structure and semantics. Given the amount of parallel work being done, we could conceivably build a superscalar processor with an instruction window large enough to simultaneously contain code from different program regions—specifically, different functions or loop iterations. However, over and above the many engineering obstacles, maintaining a large, contiguous window full of useful instructions poses a fundamental problem. Specifically, the decreasing accuracy of a series of branch predictions leads to an exponentially decreasing likelihood that instructions at the tail of the window will be useful. Overcoming this problem requires a model that lets parallelism from different program regions be exploited in a reasonably independent—that is, noncontiguous and nonserial—manner. The speculative multithreading model considers each program region to be a speculative thread or small program. By executing multiple speculative threads in parallel, high degrees of concurrency can be achieved in an aggregate fashion, especially if each thread is mostly sequential. The model subsequently merges the threads to recreate the original program. Speculative multithreading lets us fashion a large instruction window
منابع مشابه
Exploiting On-Chip Data Transfers for Improving Performance of Chip-Scale Multiprocessors
As compared to a complex single processor based system, on-chip multiprocessors are less complex, more power efficient, and easier to test and validate. In this work, we focus on an on-chip multiprocessor where each processor has a local memory (or cache). We demonstrate that, in such an architecture, allowing each processor to do off-chip memory requests on behalf of other processors can impro...
متن کاملIntegrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading
As the number of transistors on a single chip continues to grow, it is important to think beyond the traditional approaches of compiler optimizations for deeper pipelines and wider instruction issue units to improve performance. This single-threaded execution model limits these approaches to exploiting only the relatively small amount of instruction-level parallelism available in application pr...
متن کاملC-slow Technique vs Multiprocessor in designing Low Area Customized Instruction set Processor for Embedded Applications
The demand for high performance embedded processors, for consumer electronics, is rapidly increasing for the past few years. Many of these embedded processors depend upon custom built Instruction Ser Architecture (ISA) such as game processor (GPU), multimedia processors, DSP processors etc. Primary requirement for consumer electronic industry is low cost with high performance and low power cons...
متن کاملSoftware and Hardware for Exploiting Speculative Parallelism with a Multiprocessor
Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread-level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to ...
متن کاملExploiting the Potential of a Network of IRAMs
Recently, a great deal of research has gone into reducing the gap in performance between processors and their memory systems. Techniques such as prefetching have been developed in order to hide the long latencies involved in retrieving data from oo-chip DRAM. However, applications with irregular access patterns generally see greatly reduced beneet from these techniques, and latencies are becomi...
متن کامل