coordinated checkpointing

نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092 فیلتر نتایج به سال:

Survey of Backward Error Recovery Techniques for Multicomputers Based on Checkpointing and Rollback

1993

G. Deconinck J. Vounckx R. Lauwereins J. A. Peperstraete

For implementing fault-tolerance in multicomputer systems, backward error recovery, based on checkpointing and rollback, is often used. During failurefree operation, the process states are regularly saved, and after a fault is detected, the system is rolled back to a previously saved state. We can distinguish four classes of techniques: semi-automatic techniques, message logging, coordinated ch...

متن کامل

Semantics of recovery lines for backward recovery in distributed systems

1995

Jerzy Brzeziński Jean-Michel Helary Michel Raynal

This paper addresses the definition of recovery lines in the context of backward recovery whose aim is to cope with failures in distributed sytems. A general framework that allows for several semantics of recovery lines is introduced. Key notions such as missing messages and orphan messages are precisely defined and their impact on the definition of consistency of recovery lines is carefully an...

متن کامل

Handling Recurrent Failures in Coordinated Checkpointing for Mobile Distributed Systems

2016

Maridul Kothari Parveen Kumar

We propose a minimum-process coordinated checkpointing algorithm for non-deterministic mobile distributed systems, where no useless checkpoints are taken. An effort has been made to minimize the blocking of processes and synchronization message overhead. We capture the partial transitive dependencies during the normal execution by piggybacking dependency vectors onto computation messages. Frequ...

متن کامل

FTOP: A Library for Fault Tolerance in a Cluster

2002

R. Badrinath R. Gupta N. Shrivastava

Checkpointing and rollback recovery is a simple technique for fault tolerance. The state of a process is saved on a disk file from which the process can recover on the occurrence of failure. In this paper we describe the implementation of FTOP (Fault Tolerant PVM), a coordinated checkpointing library integrated with PVM. Existing PVM applications require only minor change for incorporating faul...

متن کامل

A New High Performance Checkpointing Approach for Mobile Computing Systems

2006

Bidyut Gupta Shahram Rahimi Ziping Liu

In this paper, we present a single phase non-blocking coordinated checkpointing algorithm suitable for mobile computing environments. The distinct advantages that make the proposed algorithm suitable for distributed mobile computing systems are the following. It produces a consistent set of checkpoints, without the overhead of taking temporary checkpoints; the algorithm makes sure that only min...

متن کامل

An Efficient Optimistic Message Logging Scheme for the Recoverable Mobile Computing Systems

2007

Taesoon Park Namyoon Woo Heon Y. Yeom

This paper presents an efficient scheme to implement the optimistic message logging and the asynchronous recovery for the mobile computing environment. Most of the coordinated checkpointing schemes may not be suitable for the mobile environment, since the unreliable mobile hosts and the fragile network connection may hinder any kind of coordination for checkpointing and recovery. In this paper,...

متن کامل

Improving the Performance of Coordinated Checkpointers on Networks of Workstations using RAID Techniques

1996

James S. Plank

Coordinated checkpointing systems are popular and general-purpose tools for implementing process migration , coarse-grained job swapping, and fault-tolerance on networks of workstations. Though simple in concept , there are several design decisions concerning the placement of checkpoint les that can impact the performance and functionality of coordinated checkpointers. Although several such che...

متن کامل

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems

2010

Rachit Garg Praveen Kumar

Fault Tolerance Techniques enable systems to perform tasks in the presence of faults. A checkpoint is a local state of a process saved on stable storage. In a distributed system, since the processes in the system do not share memory, a global state of the system is defined as a set of local states, one from each process. In case of a fault in distributed systems, checkpointing enables the execu...

متن کامل

On Mobile Checkpointing using Index and Time Together

2012

Awadhesh Kumar Singh

Checkpointing is one of the commonly used techniques to provide fault-tolerance in distributed systems so that the system can operate even if one or more components have failed. However, mobile computing systems are constrained by low bandwidth, mobility, lack of stable storage, frequent disconnections and limited battery life. Hence, checkpointing protocols having lesser number of synchronizat...

متن کامل

Dealing with Frequent Aborts in Minimum-process Coordinated Checkpointing Algorithm for Mobile Distributed Systems

Journal: :International Journal of Computer Applications 2010

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید