نتایج جستجو برای: coordinated checkpointing
تعداد نتایج: 48092 فیلتر نتایج به سال:
For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpointing and rollback, is often used. During failurefree operation, the process states are regularly saved, and after a fault is detected, the system is rolled back to a previously saved state. We can distinguish four classes of techniques: semi-automatic techniques, message logging, coordinated ch...
This paper addresses the definition of recovery lines in the context of backward recovery whose aim is to cope with failures in distributed sytems. A general framework that allows for several semantics of recovery lines is introduced. Key notions such as missing messages and orphan messages are precisely defined and their impact on the definition of consistency of recovery lines is carefully an...
We propose a minimum-process coordinated checkpointing algorithm for non-deterministic mobile distributed systems, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead. We capture the partial transitive dependencies during the normal execution by piggybacking dependency vectors onto computation messages. Frequ...
Checkpointing and rollback recovery is a simple technique for fault tolerance. The state of a process is saved on a disk file from which the process can recover on the occurrence of failure. In this paper we describe the implementation of FTOP (Fault Tolerant PVM), a coordinated checkpointing library integrated with PVM. Existing PVM applications require only minor change for incorporating faul...
In this paper, we present a single phase non-blocking coordinated checkpointing algorithm suitable for mobile computing environments. The distinct advantages that make the proposed algorithm suitable for distributed mobile computing systems are the following. It produces a consistent set of checkpoints, without the overhead of taking temporary checkpoints; the algorithm makes sure that only min...
This paper presents an efficient scheme to implement the optimistic message logging and the asynchronous recovery for the mobile computing environment. Most of the coordinated checkpointing schemes may not be suitable for the mobile environment, since the unreliable mobile hosts and the fragile network connection may hinder any kind of coordination for checkpointing and recovery. In this paper,...
Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration , coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept , there are several design decisions concerning the placement of checkpoint les that can impact the performance and functionality of coordinated checkpointers. Although several such che...
Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system is defined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execu...
Checkpointing is one of the commonly used techniques to provide fault-tolerance in distributed systems so that the system can operate even if one or more components have failed. However, mobile computing systems are constrained by low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life. Hence, checkpointing protocols having lesser number of synchronizat...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید