Predicting the Scalability of an STM A Pragmatic Approach

نویسندگان

Aleksandar Dragojević

Rachid Guerraoui

چکیده

Conducting a thorough performance evaluation of an STM is very time consuming. Depressingly, even with all this effort, and even with the same application, it can still be hard to predict the performance if the number of underlying threads on which the application needs to be deployed is different than those of the experiment. Basically, one might have to conduct an entire set of new experiments to get some understanding of the performance of the STM with the new number of threads. We propose a pragmatic approach to contribute to changing this state of affairs. Using classical engineering approximation techniques, we extract from a set of STM performance measurements, analytical performance functions to model the scalability of the STM. We show, more specifically, that polynomial and rational functions provide good interpolations of STM performance: even with only a handful of measurements, the average error in most cases is around 1-2%. Further, we show that we can perform reasonably precise extrapolation using rational functions: basically, using measurements with up to m threads, we can predict the performance up to roughly 2m threads with a relatively low error (around 10% in best cases). We discuss two possible applications of our approach: (1) statically deciding whether to use an STM for a given workload and a given number of threads, and (2) dynamically adjusting the number of threads that execute in parallel to match the optimal concurrency level of a given workload. Keywords Software Transactional Memory, Performance, Scalability. 1. Overview As its name indicates, Software Transactional Memory (STM) is built purely in software: part of its appeal is its independence from any specific hardware support. In principle, an STM can work on any hardware and for any size of transactions and data structures [3, 10, 12, 17, 21, 24, 26]. Yet, and not surprisingly, the performance, and actually the relevance, of the STM are highly sensitive to the target application, the workload, the underlying architecture, and the actual number of threads used for parallelizing the code. Figure 1(a) depicts the speedup of a parallel code that uses SwissTM [12] over sequential, non-instrumented code, for two different workloads from the STAMP benchmark suite [6]. The figure conveys the very fact that, while STM scales very well on the vacation low workload, outperforming sequential code by almost 30 times with 64 threads, its scalability is not nearly as good on intruder, where it outperforms sequential code by only 2 times with 64 threads. In fact, even the same application that uses STM can have highly varying performance depending on the workload configuration. Figure 1(b) illustrates this. The lower contention read-dominated STMBench7 [16] workload achieves a speedup factor of almost 12 with 64 threads, but the higher contention writedominated workload of the same benchmark is faster than noninstrumented sequential code by less than 2 times. In fact, two recent studies [8, 11] drew contradictory conclusions about the scalability of an STM even on the same subset of benchmarks. Not only the performance of different STM workloads can differ significantly, but it is also very difficult to predict how it will evolve should the number of available threads (cores) be increased. In general, experience shows that even if the scalability of an STM looks great for a range of thread values, performance might actually (slightly or significantly) drop after some point: basically, contention can simply become too high with too many threads. But knowing at which point this happens is hard without intensive experiments. For the workloads in Figure 1, the performance peaks at 22 threads for intruder, 32 for STMBench7 write, 52 for STMBench7 read, while for vacation low it keeps improving up to 64 threads. Clearly, this state of affairs might put potential adopters of STM in a difficult position: should they write their application with an STM in mind and expect the performance to speed up with a new, to be purchased, architecture with more cores? Or should they simply keep hacking their old lock-based techniques and forget about the STM? Certain rules of thumb do exist. For example, if different application threads mostly access disjoint data, and if the majority of accesses are reads, resulting in lower contention, the application is probably a good candidate for parallelization with STM and scaling it to a larger number of threads would certainly reveal beneficial. Also, if only a small portion of the code is actually using transactions, the overheads of STM will not be significant and the application is likely to benefit from STM. However, these rules are somehow vague and not easily applicable in all cases. Ideally, an automated tool would analyze the atomic blocks of the application and, based on their characteristics, would predict the scalability of the STM on that application. The analysis could partially be static, based on the source code, but would, most likely, also require analysis based on the profiling executions. Generating the code for the profiling executions is indeed feasible: (1) the programmer needs to identify atomic code blocks even if some other parallelization technique is used and (2) STM compilers [3, 13, 17, 28] automatically produce STM code eliminating the need for manual instrumentation. Nevertheless, atomic blocks have a wide variety of characteristics (e.g. duration, number and type of transactional accesses, etc.) and it is not completely clear which of these characteristics are relevant for predicting the performance. Furthermore, some of the characteristics are difficult to capture in a meaningful and concise way (e.g. transaction access sets). All of this makes the design of a tool for predicting scalability based on atomic block characteristics a daunting task. 0  5  10  15  20  25  30  1  2  4  8  16  32  64  Sp ee du p  Threads  Vaca-on Low  Intruder 

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture

Due to diversified and frequently changing demands from customers, technological advances and global competition, manufacturers rely on collaboration with their business partners to share costs, risks and expertise. How to take advantage of advancement of technologies to effectively support operations and create competitive advantage is critical for manufacturers to survive. To respond to these...

متن کامل

A Pragmatic Approach for Predicting the Scalability of Parallel Applications

Predicting the scalability of parallel applications is becoming crucial now that the number of cores in modern CPUs doubles roughly every two years. Traditional ways to get some understanding of the scalability of a parallel application rely on extensive experiments or detailed application models. Both are very time consuming and often hard to use. This paper presents PreSca, a pragmatic system...

متن کامل

Impoliteness and Power: An Interlanguage Pragmatic Approach to the Use of Impolite Patterns in Terms of Power

Although studies on pragmatics in general and politeness in particular abound in the literature, impoliteness has been largely ignored. In the present study, participants filled out either the Persian or English version of a discourse completion test (DCT). The researchers analyzed collected answers to discover the relationship between impoliteness and power. Furthermore, the researchers compar...

متن کامل

A Simple Approach to Predict the Shear Capacity and Failure Mode of Fix-ended Reinforced Concrete Deep Beams based on Experimental Study

Reinforced Concrete (RC) deep beams are commonly used in structural design to transfer vertical loads when there is a vertical discontinuity in the load path. Due to their deep geometry, the force distribution within the RC deep beams is very different than the RC shallow beams. There are some strut and tie model (STM) already been developed for RC deep beams. However, most of these models are ...

متن کامل

Explicit Instruction of Pragmatic Features: Its Impact on EFL Learners’ Knowledge of Hedging Devices in Academic Writing

Hedging academic claims has been recognized as one of integral pragmatic features of academic writing in which most EFL academic writers seem to face substantial problems. Explicit instruction has been proposed by some scholars as an effective approach to make EFL writers aware of the importance, different forms, and pragmatic functions of hedging devices some of which are polysemous and polypr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Predicting the Scalability of an STM A Pragmatic Approach

نویسندگان

چکیده

منابع مشابه

Dynamic configuration and collaborative scheduling in supply chains based on scalable multi-agent architecture

A Pragmatic Approach for Predicting the Scalability of Parallel Applications

Impoliteness and Power: An Interlanguage Pragmatic Approach to the Use of Impolite Patterns in Terms of Power

A Simple Approach to Predict the Shear Capacity and Failure Mode of Fix-ended Reinforced Concrete Deep Beams based on Experimental Study

Explicit Instruction of Pragmatic Features: Its Impact on EFL Learners’ Knowledge of Hedging Devices in Academic Writing

عنوان ژورنال:

اشتراک گذاری