A Non-Blocking Recovery Algorithm for Causal Message Logging
نویسندگان
چکیده
In the recovery of failed processes in a distributed program, causal logging schemes offer several benefits. These benefits include no rollback of unfailedprocesses and simple approaches to output commit. Unfortunately, previous approaches to the recovery of multiple simultaneous failures require that the distributed execution be blocked or that recovering processes coordinate. The latter requires assumptions which are not satisfatory. In this paper we present a solution that has neither of these drawbacks. Message logging is an important technique for recovering from failures in distributed programs. This technique logs the order in which messages are received. By assuming that receive ordering is the only source of non-determinism, execution is recoverable using this ordering. Pessimistic message logging [4, 11] forces a process to wait before sending any message while the message log is written to stable storage. Optimistic logging methods [9, 12, 13, 15] (and the similar sender based logging [8, 14]) assume failures are rare and therefore allow ordering information to be lost in a failure. (That is, a message is logged in the background while execution proceeds). Consequently, received messages and any sends that depend on them may not be recoverable. This may then require that unfailed processes roll back their execution as well. Causal message logging sends message receive ordering information with each message. This information includes receives and their causal history since the last send. The Manetho approach [6] uses this method. In family-based message logging (FBL) [2] causal history information for only K processes is included. This method then tolerates K simultaneous failures rather than all processes in the system (as with Manetho and the other logging methods.) The causal message logging approach offers advantages supported in part by a Virginia & Ernest Cockrell fellowship ysupported in part by the NSF Grants ECS-9414780, CCR-9520540, Texas Education Board Grant ARP-320, a General Motors Fellowship, and an IBM grant over the other message loggingschemes. It allows processes to execute without blocking (like optimistic logging) and never forces unfailed processes to roll back their execution (like pessimistic logging). Unfortunately, causal message loggingsuffers from complications associated with recovery not present in the other logging methods. One particular difficulty occurs when multiple processes fail simultaneously [7]. Solutions have been presented which require blocking unfailed processes or coordinating between recovering processes. Neither of these solutions is satisfactory. In this paper we present a solution without either of these drawbacks. We note that independently Alvisi, Rao, and Vin have also developed an algorithm for non-blocking recovery [3].
منابع مشابه
The Relative Overhead of Piggybacking in Causal Message Logging Protocols
Message logging protocols ensure that crashed processes make the same choices when re-executing nondeterministic events during recovery. Causal message logging protocols achieve this by piggybacking the results of these choices (called determinants) on the ambient message traffic. By doing so, these protocols do not create orphan processes nor introduce blocking in failure-free executions. To s...
متن کاملNew Causal Message Logging Protocol with Asynchronous Checkpointing for Distributed Systems
Causal message logging is an efficient approach for tolerating failures of processes in distributed systems because it has the advantages of both pessimistic and optimistic message logging approach. However, traditional causal message logging protocols prevent live processes from executing continuously their computation and require some synchronous logging to the stable storage during recovery....
متن کاملScalable Causal Message Logging for Wide-Area Environments
Causal message logging spread recovery information around the network in which the processes execute. This is an attractive property for wide area networks: it can be used to replicate processes that are otherwise inaccessible due to network partitions. However, current causal message logging protocols do not scale to thousands of processes. We describe the Hierarchical Causal Logging Protocol ...
متن کاملThe Cost of Recovery in Message Logging Protocols
ÐPast research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic, and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex trade-off when choosing a message logging protocol for fault to...
متن کاملDesign Patterns for Log-Based Rollback Recovery
Log-based rollback recovery builds on the ideas of checkpoint-based rollback recovery and improves the characteristics of the recovery process. The basic idea capture by the log-based rollback recovery techniques is an extension of the checkpoint idea. Only, instead of relying solely on checkpoints for recovering from the occurrence of an error, the system logs information about the non-determi...
متن کامل