Experimental Study of Register Saturation in Basic Blocks and Super-Blocks: Optimality and heuristics
نویسندگان
چکیده
Register saturation (RS) is the exact maximal register need of all valid schedules of a data dependence graph [4]. Its optimal computation is NP-complete. This report proposes two variants of heuristics for computing the acyclic RS of directed acyclic graphs (DAG). The first one improves the previous greedy-k heuristic [4] in terms of approximating the RS with equivalent computation times. The second heuristic is faster, has better RS approximation than greedy-k, but scarifies the computation of saturating values. In order to evaluate the efficiency of these two heuristics, we designed an optimal combinatorial algorithm computing the optimal RS for tractable cases, which turns out to be satisfactory in practice. Extensive experiments have been conducted on thousands of data dependence graphs extracted from FFMPEG, MEDIABENCH, SPEC2000 and SPEC2006 benchmarks. Numerical results are presented to demonstrate the efficiency of the two proposed heuristics, so hence they can replace the greedy-k heuristic presented in [4]. Our RS computation methods are distributed as a C independent library (RSlib) under LGPL licence. Key-words: Compilation, Code optimisation, Register saturation, Instruction level parallelism ∗ [email protected] † [email protected] in ria -0 04 31 10 3, v er si on 1 27 N ov 2 00 9 Étude expérimentale de la saturation en registres dans les blocs de base et les super-blocs: Optimalité et heuristiques Résumé : La saturation en registres (SR) est la borne exacte maximale et atteignable du besoin en registres de l’ensemble des ordonnancements possibles d’un graphe de dépendances de donnés [4]. Le calcul de la SR étant NP-complet, ce rapport présente deux heuristiques pour le calcul de la SR acyclique (pour les blocs de bases et les super-blocs d’un programme). Une première heuristique améliore l’algorithme greedy-k [4] en terme d’approximation de la valeur optimale de la SR, en ayant un temps de calcul équivalent. Une deuxième heuristique est plus rapide que greedy-k, a une meilleure approximation de la SR optimale, mais sacrifie le calcul des valeurs saturantes. Afin de tester l’efficacité de nos heuristiques, nous avons conçu un algorithme optimal et combinatoire pour le calcul de la SR. Malgré la NP-complétude du problème, notre algorithme optimal permet de résoudre assez d’instances pratiques pour faire une étude satisfaisante. Des expériences massives ont été conduites sur les graphes de dépendances de données des benchmarks FFMPEG, MEDIABENCH, SPEC2000 et SPEC2006. Les résultats numériques montrent que nos deux nouvelles heuristiques sont efficaces et rapides, elles peuvent donc remplacer greedy-k. La méthode optimale (exponentielle) et les heuristiques sont distribuées dans une librairie C indépendante (RSlib) sous licence LGPL. Mots-clés : Compilation, optimisation de code, saturation en registres, parallélisme d’instructions in ria -0 04 31 10 3, v er si on 1 27 N ov 2 00 9 Acylic Register Saturation 3
منابع مشابه
Effect of Super-Brain Yoga on Consolidation and Reconsolidation-Based Stabilization of Motor Memory
Introduction: In recent years, yoga has been used as an intervention to improve memory function. The present study aims to investigate the effect of super-brain yoga practices on the consolidation and reconsolidation of motor memory in young girls. Materials and Methods: Participants were 24 young girls in Lendeh, selected based on the inclusion criteria of the study and then randomly divided i...
متن کاملPseudo Zernike Moment-based Multi-frame Super Resolution
The goal of multi-frame Super Resolution (SR) is to fuse multiple Low Resolution (LR) images to produce one High Resolution (HR) image. The major challenge of classic SR approaches is accurate motion estimation between the frames. To handle this challenge, fuzzy motion estimation method has been proposed that replaces value of each pixel using the weighted averaging all its neighboring pixels i...
متن کاملConstraint Programming Techniques for Optimal Instruction Scheduling
Modern processors have multiple pipelined functional units and can issue more than one instruction per clock cycle. This puts great pressure on the instruction scheduling phase in a compiler to expose maximum instruction level parallelism. Basic blocks and superblocks are commonly used regions of code in a program for instruction scheduling. Instruction scheduling coupled with register allocati...
متن کاملSpill-Free Parallel Scheduling of Precedence Graphs
VLIW scheduling, register allocation This paper concerns the problem of spill-free scheduling of acyclic precedence graphs on a processor with multiple functional units and a limited number of registers. The problem of minimizing the schedule length is well known to be computationally intractable. We present a heuristic for the problem, a general divideand-conquer paradigm that converts any ins...
متن کاملEvaluation of Algorithms for Local Register Allocation
Local register allocation (LRA) assigns pseudo-registers to actual registers in a basic block so as to minimize the spill cost. In this paper, four different LRA algorithms are compared with respect to the quality of their generated allocations and the execution times of the algorithms themselves. The evaluation is based on a framework that views register allocation as the combination of bounda...
متن کامل