Parallel Program Debugging based on Data-Replay
نویسندگان
چکیده
Nondeterministic nature of parallel programs is the major difficulty in debugging. Order-replay, a technique to solve this problem, is widely used because of its small overhead. It has, however, several serious drawbacks: all processes of the parallel program have to participate in replay even when some of them are clearly not involved with the bug; and the programmer cannot stop the process being debugged at an arbitrary point. We adopt another method for deterministic replay, Data-replay, which logs contents of the events rather than their order, and makes it possible to run and stop each process independently. Data-replay is well able to cooperate with reverse execution mechanisms. We applied the Data-replay mechanism to MPI based parallel programs. The result of our experiment with NAS Parallel Benchmarks shows that our mechanism works at a practical cost. Logging communicated data incurs only 24 % overhead while it accelerates replayed execution by 38 %, both in average.
منابع مشابه
Leblanc and Mellor - Crummey : Debugging Parallel Programs with Instant Replay
The debugging cycle is the most common methodology for finding and correcting errors in sequential programs. Cyclic debugging is effective because sequential programs are usually deterministic. Debugging parallel programs is considerably more difficult because successive executions of the same program often do not produce the same results. In this paper we present a general solution for reprodu...
متن کاملOptimal Record and Replay for Debugging of Nondeterministic MPIRVM Programs
Record and Replay technique has been proved an effective solution to cyclic debugging of nondeterministic parallel program. Because of nondeterminism, a parallel program given the same inputs on successive runs can sometimes produces different results. In this paper; an optimal record and replay technique is presented, which produces less overhead in time and space by using the non-overtaking r...
متن کاملAn Efficient Logical Clock for Replaying Message-Passing Programs
Cyclic debugging is one of the most important and most commonly used activities in programs development. During cyclic debugging, the program is repeatedly re-executed to track down errors when a failure has been observed. The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. Execution re...
متن کاملAn Implementation of Race Detection and Deterministic Replay with MPI
The Parallel Debugging Tool (PDT) of the Annai programming environment is developed within the Joint CSCS-ETH/NEC Collaboration in Parallel Processing. Similarly to the other components of the integrated environment, PDT aims to provide support for application developers to debug portable large-scale data-parallel programs based on HPF, and message-passing programs based on the MPI standard. Fo...
متن کاملCyclic Debugging for pSather, a Parallel Object-Oriented Programming Language
The paper discusses the main aspects of a parallel debugger for the parallel object-oriented language pSather. PSather provides for a single shared-address space and for multiple threads per processor. Threads can arbitrarily migrate between processors. The debugger supports cyclic debugging which is a standard and quite effective technique for sequential programs. To address nondeterminism, de...
متن کامل