Selecting Benchmarks Combinations for the Evaluation of Multicore Throughput
نویسندگان
چکیده
Most high-performance processors today are able to execute multiple threads of execution simultaneously. Threads share processor resources, like the last-level cache, which may decrease throughput in a non obvious way, depending on threads characteristics. Computer architects usually study multiprogrammed workloads by considering a set of benchmarks and some combinations of these benchmarks. Because cycle-accurate microarchitecture simulators are slow, we want a set of combinations that is as small as possible, yet representative. However, there is no standard method for selecting such sample, and different authors have used different methods. It is not clear how the choice of a particular sample impacts the conclusions of a study. We propose and compare different sampling methods for defining multiprogrammed workloads for computer architecture. We evaluate their effectiveness on a case study, the comparison of several multicore last-level cache replacement policies. We show that random sampling, the simplest method, is robust to define a representative sample of workloads, provided the sample is big enough. We propose a method for estimating the required sample size based on fast approximate simulation. We propose a new method, workload stratification, which is very effective at reducing the sample size in situations where random sampling would require large samples. Key-words: workload selection, multicore, sampling methods, replacement policies, throughput methods ha l-0 07 37 44 6, v er si on 2 8 N ov 2 01 2 Sélection de charges multitâches de référence pour l’évaluation du débit d’exécution des processeurs multicœurs Résumé : Aujourd’hui, la plupart des processeurs hautes performances sont capables d’exécuter plusieurs flots d’exécution simultanément. Ces flots d’exécution partagent les ressources du processeur, comme le cache de dernier niveau, ce qui peut réduire le débit d’exécution de manière difficilement prévisible, selon les caractéristiques de ces flots. Les architectes étudient généralement les charges multitâches en considérant un ensemble de charges de référence et des combinaisons de ces charges de référence. Comme les simulateurs précis au cycle près sont lents, nous voulons un ensemble de combinaisons qui soit aussi petit que possible, mais représentatif. Cependant, il n’existe pas de méthode standard pour la sélection de ces échantillons et différents auteurs ont utilisé différentes méthodes. Il n’est pas clair en quoi le choix d’un échantillon en particulier a une incidence sur les conclusions d’une étude. Nous proposons et comparons différentes méthodes d’échantillonnage permettant de définir des charges multitâches pour l’architecture des ordinateurs. Nous évaluons leur efficacité sur une étude de cas : la comparaison de plusieurs politiques de remplacement pour le cache de dernier niveau. Nous montrons que l’échantillonnage aléatoire, la méthode la plus simple, est robuste pour définir un échantillon représentatif de la charge de travail, à condition que l’échantillon soit assez grand. Nous proposons une méthode d’estimation de la taille de l’échantillon nécessaire basée sur une simulation rapide approximative. Nous proposons une nouvelle méthode, la stratification de chargesmultitâches, qui est très efficace pour réduire la taille de l’échantillon dans les cas où un échantillonnage aléatoire requerrait de grands échantillons. Mots-clés : charges multitâches, multicœurs, méthodes d’échantillonnage, politique de remplacement ha l-0 07 37 44 6, v er si on 2 8 N ov 2 01 2 Selecting Benchmarks Combinations for Multicore Studies 3
منابع مشابه
Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems
Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...
متن کاملMulticore OS Benchmarks: We Can Do Better
Current multicore OS benchmarks do not provide workloads that sufficiently reflect real-world use: they typically run a single application, whereas real workloads consist of multiple concurrent programs. In this paper we show that this lack of mixed workloads leads to benchmarks that do not fully exercise the OS and are therefore inadequate at predicting real-world behavior. This implies that e...
متن کاملA Clustering Approach to Scientific Workflow Scheduling on the Cloud with Deadline and Cost Constraints
One of the main features of High Throughput Computing systems is the availability of high power processing resources. Cloud Computing systems can offer these features through concepts like Pay-Per-Use and Quality of Service (QoS) over the Internet. Many applications in Cloud computing are represented by workflows. Quality of Service is one of the most important challenges in the context of sche...
متن کاملOptimizing Code by Selecting Compiler Flags using Parallel Genetic Algorithm on Multicore CPUs
The compiler optimization phase ordering not only possesses challenges to compiler developer but also for multithreaded programmer to enhance the performance of Multicore systems. Many compilers have numerous optimization techniques which are applied in predetermined ordering. These ordering of optimization techniques may not always give an optimal code further it is impossible to find a unanim...
متن کاملPredicting cardiac arrhythmia on ECG signal using an ensemble of optimal multicore support vector machines
The use of artificial intelligence in the process of diagnosing heart disease has been considered by researchers for many years. In this paper, an efficient method for selecting appropriate features extracted from electrocardiogram (ECG) signals, based on a genetic algorithm for use in an ensemble multi-kernel support vector machine classifiers, each of which is based on an optimized genetic al...
متن کامل