Feedback-Directed Optimizations with Estimated Edge Profiles from Hardware Event Sampling
نویسندگان
چکیده
Traditional feedback-directed optimization (FDO) uses static instrumentation to collect profiles. This method has shown good application performance gains, but is not commonly used in practice due to the high runtime overhead of profile collection, the tedious dual-compile usage model, and difficulties in generating representative training data sets. In this paper, we show that edge frequency estimates can be successfully constructed with heuristics using profile data collected by sampling of hardware events, incurring low runtime overhead (e.g., less then 2%), and requiring no instrumentation, yet achieving competetive performance gains. Our initial results show a 3-4% performance gain on the SPEC C benchmarks.
منابع مشابه
Feedback-Directed Optimizations in GCC with Estimated Edge Profiles from Hardware Event Sampling
Traditional feedback-directed optimization (FDO) in GCC uses static instrumentation to collect edge and value profiles. This method has shown good application performance gains, but is not commonly used in practice due to the high runtime overhead of profile collection, the tedious dual-compile usage model, and difficulties in generating representative training data sets. In this paper, we show...
متن کاملUsing Large Input Sets with Hardware Performance Monitoring for Profile Based Compiler Optimizations
Traditional Profile Guided Optimization (PGO) uses program instrumentation with one or more small training input data sets to generate edge or value profiles to guide compiler optimizations. This approach has been effective in predicting branch directions for many applications. However, for optimizations that are more dependent on the performance characteristics and the accuracy of the profiles...
متن کاملProfileMe: Hardware Support for Instruction-Level Profiling on Out-of-Order Processors
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor’s performance monitoring hardware is an effective, unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts events, such as cache misses and branch mispredictions, and cannot accurately attribute these events to instructions, especially ...
متن کاملAdaptive Sampling of Performance Counters
Many applications of profiling based on sampling of Performance Counters (PC), such as feedback-directed optimization and software reliability, are often constrained by the amount of information that can be obtained without perturbing significantly the behavior of the profiled task. Current implementation of event and time based sampling software utilize fixed or random sampling periods which a...
متن کاملImplementing the render cache and the edge-and-point image on graphics hardware
The render cache and the edge-and-point image (EPI) are techniques that permit high quality rendering at interactive rates of models illuminated with complex ray traced techniques, combining sparse sampling and discontinuities-respecting interpolation. The image reconstruction is decoupled from the samples generation process and permits the use of arbitrary shaders to gather shading samples. Al...
متن کامل