Accurately modeling speculative instruction fetching in trace-driven simulation
نویسندگان
چکیده
Performance evaluation of modern, highly speculative, out-of-order microprocessors and the corresponding production of detailed, valid, accurate results have become serious challenges. A popular evaluation methodology is trace-driven simulation which provides the advantage of a highly portable simulator that is independent of the constraints of the trace generation system. While developing and maintaining a trace-driven simulator is relatively easier than other alternatives, a primary drawback is the inability to accurately simulate speculative instruction fetching and subsequent execution. Fetching from an incorrect path occurs often in a speculative processor, however it is di cult to capture this information in a trace. This paper investigates a scheme to accurately model instruction fetching within a trace-driven framework. This is accomplished by recreating an approximate copy of the object code segment, which we call resurrected code, using a preliminary pass through the trace. We discuss a fast and memory-e cient method for implementing this resurrected code. In addition, we characterize UltraSPARC traces of C, C++, and Fortran programs generated using Shade to determine the potential of this method. Using these traces, and a modest branch predicting scheme, we nd that in 14 of 16 cases more than 99% of all branches will nd their target instruction in the resurrected code. Furthermore, on these occasions, a large amount of consecutive instructions are available along the mispredicted path. These results indicate that the inaccuracies associated with speculative fetching in trace-driven simulation can be signi cantly reduced using this resurrected code. L. John is supported by the National Science Foundation under Grants CCR-9796098 (CAREER Award), and EIA9807112, and a grant from the Texas Advanced Technology Program. F. Matus is also with Advanced Micro Devices.
منابع مشابه
Out-of-Order Instruction Fetch Using Multiple Sequencers
Conventional instruction fetch mechanisms fetch contiguous blocks of instructions in each cycle. They are difficult to scale since taken branches make it hard to increase the size of these blocks beyond eight instructions. Trace caches have been proposed as a solution to this problem, but they use cache space inefficiently. We show that fetching large blocks of contiguous instructions, or wide ...
متن کاملPre-execution via Speculative Data-driven Multithreading
This dissertation introduces pre-execution, a novel technique for accelerating sequential programs. Pre-execution directly attacks the instructions that cause performance problems—mis-predicted branches and cache missing loads. In preexecution, future branch outcomes and load addresses are computed on the side and the results are fed to the main program. In doing so, the main program is spared ...
متن کاملCan Trace-Driven Simulators Accurately Predict Superscalar Performance?
There are four crucial issues associated with performance simulators: simulator retargetability, simulator validation, simulation speed and simulation accuracy. This paper documents our experiences in developing performance simulators and our recent findings in using these simulators. We are concerned with all four of the crucial issues. Our first-generation tool, VMW, focused on achieving reta...
متن کاملModeled and Measured Instruction Fetching Performance for Superscalar Microprocessors
Instruction fetching is critical to the performance of a superscalar microprocessor. We develop a mathematical model for three different cache techniques and evaluate its performance both in theory and in simulation using the SPEC95 suite of benchmarks. In all the techniques, the fetching performance is dramatically lower than ideal expectations. To help remedy the situation, we also evaluate i...
متن کاملFuture Branches { beyond Speculative Execution
The performance and hardware complexity of superscalar architectures is hindered by conditional branch instructions. When conditional branches are encountered in a program, the instruction fetch unit must rapidly predict the branch predicate and begin speculatively fetching instructions with no loss of instruction throughput. Speculative execution increases hardware cost, since speculative inst...
متن کامل