Performance Advantages of Merging Instruction - and Data - Level Parallelism

نویسندگان

  • Francisca Quintana
  • Roger Espasa
  • Mateo Valero
چکیده

This report presents a new architecture based on addding a vector pipeline to a superscalar microprocessor. The goal of this paper is to show that instruction-level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single architecture to execute regular vectorizable code at a performance level that can not be achieved using only ILP techniques. We present an analysis of the two paradigms at the instruction set architecture (ISA) level that shows that the DLP model has several advantages: executes fewer instructions, fewer overall operations (by factors as large as 1.7), and generally executes fewer memory accesses. We then analyze the ILP model in terms of IPC. Our simulations show that a 4-way machine achieves IPCs in the range 1.03-1.52 and that by scaling to 16-way, only a 26% of the peak IPC is achieved. We have also studied the DLP machine, and results show that although they are less sensitive to memory latency, its use is limited to highly vectorizable code. The combined ILP+DLP model is shown to perform from 1.24 to 2.84 times better than the 4-way ILP machine. Moreover, when we scale up the ILP+DLP machine, the speedup over the 16-way ILP machine increases to as much as 3.45. All this extra performance is shown to be achieved with very modest control hardware, thus ensuring that clock cycle time is not jeopardized in our proposed architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Simultaneous multithreaded vector architecture: merging ILP and DLP for high performance

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a single simultaneous vector multithreaded architecture to execute regular vec-torizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low co...

متن کامل

A case for merging the ILP and DLP paradigms

The goal of this paper is to show that instruction level parallelism (ILP) and data-level parallelism (DLP) can be merged in a stngle architecture to ezecute vectorizable code at a performance level that can not be achieved using either paradigm on its own. We will show that the combination of the two techniques yields very high performance at a low cost and a low complexity. We will show that ...

متن کامل

Multi-core processors - An overview

Microprocessors have revolutionized the world we live in and continuous efforts are being made to manufacture not only faster chips but also smarter ones. A number of techniques such as data level parallelism, instruction level parallelism and hyper threading (Intel’s HT) already exists which have dramatically improved the performance of microprocessor cores. [1, 2] This paper briefs on evoluti...

متن کامل

Branch merging for scheduling concurrent executions of branch operatio - Computers and Digital Techniques, IEE Proceedings-

Branches are a major limiting factor to instruction-level parallelism. One solution is to execute several branches simultaneously using multiway branching architectures. Such architectures are especially important when the instruction issue width becomes large. The authors study the problem of compile-time scheduling of branch operations on such architectures: an optimisation called branch merg...

متن کامل

Explicit Dynamic Scheduling: A Practical Micro-Data ow Architecture

This paper introduces Explicit Dynamic Scheduling (EDS), a practical implementation of dataaow on a chip. By combining RISC design principles with well-known compiler dependence analysis techniques, EDS combines a straightforward hardware design, suitable for high speed implementation, with the performance advantages of dataaow at the instruction level. EDS uniies pipeline and memory latency to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998