coordinated checkpointing

نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092 فیلتر نتایج به سال:

Characterization of Consistent Global Checkpoints in Large-Scale Distributed Systems

1995

Roberto Baldoni Jerzy Brzezinski Jean-Michel Hélary Achour Mostéfaoui Michel Raynal

Backward error recovery is one of the most used schemes to ensure fault-tolerance in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation in an error-free global state from which it can be resumed to produce a correct behaviour. Checkpointing is one of the techniques to pursue the backward error recovery. As we consider large-scale distribut...

متن کامل

Reverse Time Migration with Optimal Checkpointing

2006

William W. Symes

The optimal checkpointing algorithm (Griewank and Walther, 2000) minimizes the computational complexity of the adjoint state method. Applied to reverse time migration, optimal checkpointing eliminates (or at least drastically reduces) the need for disk i/o, which is quite extensive in more straightforward implementations. This paper describes optimal checkpointing in a form which applies both t...

متن کامل

Using Re ection for Checkpointing Concurrent Object Oriented Programs

1998

Mangesh Kasbekar Chandramouli Narayanan Chita R Das

This paper presents a re ective approach to checkpointing concurrent object oriented programs. We describe a checkpointing and rollback library for multithreaded programs written in C++. We demonstrate some of the unique features o ered by this library, such as selective checkpointing and selective rollbacks of threads of a process that are achievable only through the use of re ection.

متن کامل

Metapromela: A Toolkit for Simulation of Checkpointing Algorithms

2007

Gustavo Maciel Dias Vieira

Distributed checkpointing algorithms play an important role in the majority of the fault tolerant software components existent today. Unfortunately, there is a lack of comprehensive and uniform performance testing of those algorithms. Our research focuses on the provision of a toolkit, Metapromela, that helps with the implementation and testing of distributed checkpointing algorithms. This pape...

متن کامل

Optimizing Checkpointing Performance in Spark

Journal: :DEStech Transactions on Computer Science and Engineering 2017

متن کامل

Fault Tolerant Matrix Operations for Networks of Workstations Using Multiple Checkpointing

1997

Youngbae Kim James S. Plank Jack J. Dongarra

Recently, an algorithm-based approach using diskless checkpointing has been developed to provide fault tolerance for high-performance matrix operations. With this approach, since fault tolerance is incorporated into the matrix operations, the matrix operations become resilient to any single processor failure or change with low overhead. In this paper, we present a technique called multiple chec...

متن کامل

A High Performance Non-Blocking Checkpointing/Recovery Algorithm For Ring Networks

2006

Bidyut Gupta Namdar Mogharreban Shahram Rahimi A. Vemuri

In this paper, we have proposed a new checkpointing / recovery algorithm for ring network architecture. The checkpointing algorithm produces a consistent set of checkpoints in a uni-directional network with the help of few control messages and also avoids the overhead of taking temporary checkpoints unlike most other existing checkpointing algorithms. The number of interrupts to the processes i...

متن کامل

CCNCheck: Enabling Checkpointed Distributed Applications in Content Centric Networks

Journal: :CoRR 2015

Nitinder Mohan Pushpendra Singh

We consider the problem of checkpointing a distributed application efficiently in Content Centric Networks so that it can withstand transient failures. We present CCNCheck, a system which enables a sender optimized way of checkpointing distributed applications in CCN’s and provides an efficient mechanism for failure recovery in such applications. CCNCheck’s checkpointing mechanism is a fork of ...

متن کامل

Nonblocking Checkpointing for Optimistic Parallel Simulation: Description and an Implementation

Journal: :IEEE Trans. Parallel Distrib. Syst. 2003

Francesco Quaglia Andrea Santoro

This paper describes a non-blocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g. event list update, event execution), with the aim at removing the cost of recording state information from the completion time of the parallel simulation application. ...

متن کامل

An Overview of Checkpointing in

1997

James S. Plank John G. Webster

Checkpointing is the act of saving the state of a running program so that it may be reconstructed later in time. It is an important basic functionality in computing systems that paves the way for powerful tools in many elds of computer science. This article provides a comprehensive overview of checkpointing in uniprocessor and parallel processing systems, including deenitions, uses of checkpoin...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید