A Dynamic Checkpointing and Rollback Recovery Solution Based on Task Switching
نویسندگان
چکیده
Fault tolerance is an important issue in operating system. Checkpointing and Rollback Recovery (CRR) is a key technique to fault tolerance. Its simplicity and effectiveness make it widely applied to fault maintenance of operating system. CRR can be divided into checkpoint storage and restoration. And checkpoint storage is key factor to real-time of checkpoint recovery. Current checkpoint storage is driven by clock and lack of real-time and flexibility. A dynamic CRR solution is proposed in this paper. In the solution, checkpoint storage occurs at the time of task switching rather than clock interrupt. Through applying it to SANC, the mechanism is proved to achieve high real-time of rollback recovery.
منابع مشابه
Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip
In this paper we propose a dynamically reconfigurable failure recovery scheme developed for Network-on-Chip (NoC) based systems. The recovery scheme is based on a checkpointing and rollback protocol and permits enhancing the system fault tolerance capabilities by exploiting information on traffic load and failure rate. The increased performance of the fault tolerance mechanism is achieved by si...
متن کاملRoll-Forward and Rollback Recovery: Performance-Reliability Trade-Off
Trade-O Dhiraj K. Pradhan Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 fpradhan,[email protected] Abstract Performance and reliability achieved by a modular redundant system depend on the recovery scheme used. Typically, gain in performance using comparable resources results in reduced reliability. Several highperformance computers are not...
متن کاملDynamic Node Recovery in MANET for High Recovery Probability
One of the key design issues in ad hoc networks is the development of rollback recovery model for providing faulttolerance in MANET. Because the potential problem of MANET is limited energy, probability of fault occurrences is more. Hence, checkpointing is done at trusted nodes when faults are encountered, for successful rollback to the last saved state. This makes trust a vital factor to be de...
متن کاملCheckpointing and Rollback Recovery in Distributed Systems: Existing Solutions, Open Issues and Proposed Solutions
Checkpointing and rollback recovery are wellestablished techniques for dealing with failures in distributed systems. In this paper, we briefly summarize the existing solution approaches to these problems and also discuss the open issues, suggested approaches and some preliminary work that we have done addressing the open issues.
متن کاملSurvey of Backward Error Recovery Techniques for Multicomputers Based on Checkpointing and Rollback
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpointing and rollback, is often used. During failurefree operation, the process states are regularly saved, and after a fault is detected, the system is rolled back to a previously saved state. We can distinguish four classes of techniques: semi-automatic techniques, message logging, coordinated ch...
متن کامل