Performance Advantages of Merging Instruction - and Data - Level Parallelism
نویسندگان
چکیده
This report presents a new architecture based on addding a vector pipeline to a superscalar microprocessor. The goal of this paper is to show that instruction-level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute regular vectorizable code at a performance level that can not be achieved using only ILP techniques. We present an analysis of the two paradigms at the instruction set architecture (ISA) level that shows that the DLP model has several advantages: executes fewer instructions, fewer overall operations (by factors as large as 1.7), and generally executes fewer memory accesses. We then analyze the ILP model in terms of IPC. Our simulations show that a 4-way machine achieves IPCs in the range 1.03-1.52 and that by scaling to 16-way, only a 26% of the peak IPC is achieved. We have also studied the DLP machine, and results show that although they are less sensitive to memory latency, its use is limited to highly vectorizable code. The combined ILP+DLP model is shown to perform from 1.24 to 2.84 times better than the 4-way ILP machine. Moreover, when we scale up the ILP+DLP machine, the speedup over the 16-way ILP machine increases to as much as 3.45. All this extra performance is shown to be achieved with very modest control hardware, thus ensuring that clock cycle time is not jeopardized in our proposed architecture.
منابع مشابه
Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance
The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single simultaneous vector multithreaded architecture to execute regular vec-torizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low co...
متن کاملA case for merging the ILP and DLP paradigms
The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that ...
متن کاملMulti-core processors - An overview
Microprocessors have revolutionized the world we live in and continuous efforts are being made to manufacture not only faster chips but also smarter ones. A number of techniques such as data level parallelism, instruction level parallelism and hyper threading (Intel’s HT) already exists which have dramatically improved the performance of microprocessor cores. [1, 2] This paper briefs on evoluti...
متن کاملBranch merging for scheduling concurrent executions of branch operatio - Computers and Digital Techniques, IEE Proceedings-
Branches are a major limiting factor to instruction-level parallelism. One solution is to execute several branches simultaneously using multiway branching architectures. Such architectures are especially important when the instruction issue width becomes large. The authors study the problem of compile-time scheduling of branch operations on such architectures: an optimisation called branch merg...
متن کاملExplicit Dynamic Scheduling: A Practical Micro-Data ow Architecture
This paper introduces Explicit Dynamic Scheduling (EDS), a practical implementation of dataaow on a chip. By combining RISC design principles with well-known compiler dependence analysis techniques, EDS combines a straightforward hardware design, suitable for high speed implementation, with the performance advantages of dataaow at the instruction level. EDS uniies pipeline and memory latency to...
متن کامل