نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092  

1995
Roberto Baldoni Jerzy Brzezinski Jean-Michel Hélary Achour Mostéfaoui Michel Raynal

Backward error recovery is one of the most used schemes to ensure fault-tolerance in distributed systems. It consists, upon the occurrence of a failure, in restoring a distributed computation in an error-free global state from which it can be resumed to produce a correct behaviour. Checkpointing is one of the techniques to pursue the backward error recovery. As we consider large-scale distribut...

2006
William W. Symes

The optimal checkpointing algorithm (Griewank and Walther, 2000) minimizes the computational complexity of the adjoint state method. Applied to reverse time migration, optimal checkpointing eliminates (or at least drastically reduces) the need for disk i/o, which is quite extensive in more straightforward implementations. This paper describes optimal checkpointing in a form which applies both t...

1998
Mangesh Kasbekar Chandramouli Narayanan Chita R Das

This paper presents a re ective approach to checkpointing concurrent object oriented programs. We describe a checkpointing and rollback library for multithreaded programs written in C++. We demonstrate some of the unique features o ered by this library, such as selective checkpointing and selective rollbacks of threads of a process that are achievable only through the use of re ection.

2007
Gustavo Maciel Dias Vieira

Distributed checkpointing algorithms play an important role in the majority of the fault tolerant software components existent today. Unfortunately, there is a lack of comprehensive and uniform performance testing of those algorithms. Our research focuses on the provision of a toolkit, Metapromela, that helps with the implementation and testing of distributed checkpointing algorithms. This pape...

Journal: :DEStech Transactions on Computer Science and Engineering 2017

1997
Youngbae Kim James S. Plank Jack J. Dongarra

Recently, an algorithm-based approach using diskless checkpointing has been developed to provide fault tolerance for high-performance matrix operations. With this approach, since fault tolerance is incorporated into the matrix operations, the matrix operations become resilient to any single processor failure or change with low overhead. In this paper, we present a technique called multiple chec...

2006
Bidyut Gupta Namdar Mogharreban Shahram Rahimi A. Vemuri

In this paper, we have proposed a new checkpointing / recovery algorithm for ring network architecture. The checkpointing algorithm produces a consistent set of checkpoints in a uni-directional network with the help of few control messages and also avoids the overhead of taking temporary checkpoints unlike most other existing checkpointing algorithms. The number of interrupts to the processes i...

Journal: :CoRR 2015
Nitinder Mohan Pushpendra Singh

We consider the problem of checkpointing a distributed application efficiently in Content Centric Networks so that it can withstand transient failures. We present CCNCheck, a system which enables a sender optimized way of checkpointing distributed applications in CCN’s and provides an efficient mechanism for failure recovery in such applications. CCNCheck’s checkpointing mechanism is a fork of ...

Journal: :IEEE Trans. Parallel Distrib. Syst. 2003
Francesco Quaglia Andrea Santoro

This paper describes a non-blocking checkpointing mode in support of optimistic parallel discrete event simulation. This mode allows real concurrency in the execution of state saving and other simulation specific operations (e.g. event list update, event execution), with the aim at removing the cost of recording state information from the completion time of the parallel simulation application. ...

1997
James S. Plank John G. Webster

Checkpointing is the act of saving the state of a running program so that it may be reconstructed later in time. It is an important basic functionality in computing systems that paves the way for powerful tools in many elds of computer science. This article provides a comprehensive overview of checkpointing in uniprocessor and parallel processing systems, including deenitions, uses of checkpoin...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید