Adaptive Tracking of Cross-Thread Dependences
نویسندگان
چکیده
OCTET is a framework for dynamic analysis that soundly captures cross-thread dependences in parallel programs. It optimistically assumes that most accesses do not conflict, enabling the instrumentation to not perform synchronization at non-conflicting accesses. However, OCTET’s performance can suffer substantially if an application triggers more than a small fraction of conflicting accesses: on the order of 0.1% or more of all accesses, according to our investigations. This paper introduces an adaptive approach to replace as many heavyweight conflicting transitions as possible with so-called “pessimistic” transitions. An adaptive policy determines if an object should switch from an optimistic to a pessimistic state, or from pessimistic to optimistic state, based on online profiling. Experimental results show that this approach can reduce the overhead of OCTET by 30–40% for one program, while adding low overhead for applications that do not have many conflicting accesses. Note to the reader. This technical report (TR) describes recent preliminary innovations and results. In its current form, this TR may not stand on its own, and readers may need to read our prior work on OCTET for background [3]. Questions, suggestions, and feedback are welcome. 1. Background and Motivation Writing and debugging parallel programs has been a longstanding challenge. Researchers have proposed various solutions for guaranteeing reliable concurrency, among which dynamic, software-only analyses have perhaps the most potential to be practical. To check and enforce concurrency correctness, these analyses rely on tracking cross-thread dependences (i.e., conflicting accesses to shared memory). A fundamental problem is how to track cross-thread dependences soundly and efficiently. OCTET soundly captures cross-thread dependences in an optimistic way: its analysis adds low overhead at nonconflicting accesses, but conflicting accesses require expensive communication between threads [3]. The analysis tracks the “locality state” of each object; instrumentation at loads and stores uses the state to identify conflicting accesses and updates the state if needed. OCTET’s optimistic design optiSame state Conflicting Upgrading or fence eclipse6 99.9984% 0.0011% 0.00050% hsqldb6 99.73% 0.16% 0.11% lusearch6 99.99971% 0.00018% 0.00011% xalan6 99.80% 0.12% 0.080% avrora9 99.81% 0.095% 0.098% jython9 99.9999985% 0.0000012% 0.00000030% luindex9 99.99982% 0.00011% 0.000065% lusearch9 99.99985% 0.00011% 0.000040% pmd9 99.988% 0.0068% 0.0047% sunflow9 99.999920% 0.000036% 0.000045% xalan9 99.84% 0.094% 0.063% pjbb2000 99.89% 0.055% 0.052% pjbb2005 99.16% 0.51% 0.33% Table 1. The fraction of all accesses that trigger each kind of OCTET state transition (including “same state” transitions). We round each percentage x as much as possible such that x and 100%− x each have two significant digits. mizes the common case, based on the observation that most accesses are compatible with the state. It slows programs by 26% on average [3], which is significantly faster than comparable prior work targeting commodity systems. However, for an application that performs many conflicting accesses, although these accesses are still not the majority of all accesses, OCTET’s overhead increases drastically and is much slower than the naïve pessimistic model [3] or von Praun and Gross’s state model [8]. We find empirically that if the ratio of conflicting transitions to all accesses is approximately at least 0.1%, the roundtrip coordinations could incur significant overhead. Another important factor (that we have not yet included in our model) is that some invocations of the coordination protocol are more expensive than others, e.g., the explicit protocol is more expensive than the implicit protocol, and RdSh →WrEx are more expensive than other conflicting transitions. Table 1 shows the ratio of state transitions for each category. We have collected these results on a 4-core system using the same methodology as in Section 4. For the 13 applications we have tested, 12 have less than 0.2% of all state transitions conflicting. pjbb2005 has 0.51% conflicting transitions and OCTET experiences a 1.8X slowdown (Section 4). Executing on another platform using 32 cores, this
منابع مشابه
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support
It is notoriously challenging to achieve parallel software systems that are both scalable and reliable. Parallel runtime support—such as multithreaded record & replay, data race and atomicity violation detectors, transactional memory, and support for stronger memory models—helps achieve these goals, but existing commodity solutions slow programs substantially in order to capture (track or contr...
متن کاملTracking Conflicting Accesses Efficiently for Software Record and Replay
Record and replay, which records a multithreaded program’s execution in one run and reproduces it deterministically in a second run, is useful for program debugging, fault detection and analysis. The key challenge in multithreaded record and replay is ensuring that conflicting, cross-thread accesses to shared variables are properly detected, recorded and reproduced. Numerous solutions have been...
متن کاملEfficient Deterministic Replay of Multithreaded Programs Based on Efficient Tracking of Cross-Thread Dependences
Shared-memory parallel programs are inherently nondeterministic, making it difficult to diagnose rare bugs and to achieve deterministic execution, e.g., for replication. Existing multithreaded record & replay approaches have serious limitations such as relying on custom hardware or slowing programs by an order of magnitude. This paper introduces an approach for multithreaded record & replay bas...
متن کاملAdaptive Memory Synchronization (AMS): Balancing the Risks and Benefits of Inter-thread Load Speculation
Speculative parallelization (SP) enables a processor to extract multiple threads from a sequential instruction stream, and execute them in parallel. For speculative parallelization to achieve high performance on integer programs, loads must speculate on the data dependences among threads. Techniques for speculating on inter-thread data dependences have a firstorder impact on the performance, po...
متن کاملOCTET: Practical Concurrency Control for Dynamic Analyses and Systems
Parallel programming is essential for reaping the benefits of parallel hardware, but it is notoriously difficult to develop and debug reliable, scalable software systems. One key challenge is that modern languages and systems provide poor support for ensuring concurrency correctness properties—such as atomicity, sequential consistency, and multithreaded determinism—because all existing approach...
متن کامل