Multi-objective optimisations for a superscalar architecture with selective value prediction
نویسندگان
چکیده
This work extends an earlier manual design space exploration of our developed Selective Load Value Prediction based superscalar architecture to the L2 unified cache. After that we perform an automatic design space exploration using a special developed software tool by varying several architectural parameters. Our goal is to find optimal configurations in terms of CPI (Cycles per Instruction) and energy consumption. By varying 19 architectural parameters, as we proposed, the design space is over 2.5 millions of billions configurations which obviously means that only heuristic search can be considered. Therefore, we propose different methods of automatic design space exploration based on our developed FADSE tool which allow us to evaluate only 2500 configurations of the above mentioned huge design space! The experimental results show that our automatic design space exploration (DSE) provides significantly better configurations than our previous manual DSE approach, considering the proposed multi-objective approach.
منابع مشابه
Multi-Objective Optimizations for a Superscalar Architecture with Selective Value Prediction
This work extends an earlier manual design space exploration of our developed Selective Load Value Prediction based superscalar architecture to the L2 unified cache. After that we perform an automatic design space exploration using a special developed software tool by varying several architectural parameters. Our goal is to find optimal configurations in terms of CPI (Cycles per Instruction) an...
متن کاملExploiting selective instruction reuse and value prediction in a superscalar architecture
In our previously published research we discovered some very difficult to predict branches, called unbiased branches. Since the overall performance of modern processors is seriously affected by misprediction recovery, especially these difficult branches represent a source of important performance penalties. Our statistics show that about 28% of branches are dependent on critical Load instructio...
متن کاملPerformance Limits Due to Inter-Cluster Data Forwarding in Wire-Limited ILP Microprocessors
The growing speed gap between transistors and wire interconnects is forcing the development of distributed, or clustered, architectures. These designs partition the chip into small regions with fast intra-cluster communication. Longer latency is required to communicate between clusters. The hardware and/or software is responsible for scheduling instructions to clusters such that critical path c...
متن کاملSelective Harmonics Elimination Technique in Cascaded H-Bridge Multi-Level Inverters Using the Salp Swarm Optimization Algorithm
A new optimization method is proposed in this paper for finding the firing angles in multi-level voltage source inverters to eliminate low-order selective harmonics and reduce total harmonic distortion (THD) value of the output voltage. For thid end, Fourier series is used for calculating objective function and selecting specific harmonics. Regarding the nature and complexity of the employed no...
متن کاملDelayed Branches Versus Dynamic Branch Prediction in a High- Performance Superscalar Architecture
While delayed branch mechanisms were popular with the designers of RISC processors, most superscalar processors deploy dynamic branch prediction to minimise run-time branch penalties. We propose a generalised branch delay mechanism that is more suited to superscalar processors. We then quantitatively compare the performance of our delayed branch mechanism with run-time branch prediction, in the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IET Computers & Digital Techniques
دوره 6 شماره
صفحات -
تاریخ انتشار 2012