نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092  

Journal: :Softw., Pract. Exper. 1999
Shang-Te Hsu Ruei-Chuan Chang

Process checkpointing is a procedure which periodically saves the process states into stable storage. Most checkpointing facilities select hard disks for archiving. However, the disk seek time is limited by the speed of the read-write heads, thus checkpointing process into a local disk requires extensive disk bandwidth. In this paper, we propose an approach that exploits the memory on idle work...

1997
James S. Plank Michael Puening Kai Li Michael A. Puening

The precursor to this work (where diskless checkpointing was rst described) was presented at FTCS-24 27]. Abstract Diskless Checkpointing is a technique for checkpointing the state of a long-running computation on a distributed system without relying on stable storage. As such, it eliminates the performance bottleneck of traditional checkpointing on distributed systems. In this paper, we motiva...

1998
James E. Lumpp

For long-running or large-scale distributed programs, the ability to provide software fault-tolerance via checkpointing is valuable. For scalable systems, multicast communication is becoming a predominant communication paradigm. While some aspects of consistency and channel state are the same for both unicast and multicast protocols, the implementation of checkpointing systems differ. This pape...

1996
Mukesh Singhal

Checkpointing algorithms are classiied as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coor...

1994
Micah Beck James S. Plank Gerry Kingsley

In this paper we present compiler-assisted checkpointing, a new technique which uses static program analysis to optimize the performance of checkpointing. We achieve this performance gain using libckpt, a checkpointing library which implements memory exclusion in the context of user-directed checkpointing. The correctness of user-directed checkpointing is dependent on program analysis and inser...

Journal: :IJHPCA 2011
Rinku Gupta Harish Gapanati Naik Peter H. Beckman

Providing fault tolerance in high-end petascale systems, consisting of millions of hardware components and complex software stacks, is becoming an increasingly challenging task. Checkpointing continues to be the most prevalent technique for providing fault tolerance in such high-end systems. Considerable research has focussed on optimizing checkpointing; however, in practice, checkpointing stil...

Journal: :Electr. Notes Theor. Comput. Sci. 2015
Matthew Forshaw A. Stephen McGough Nigel Thomas

Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware or software failures as well as interruptions from resource owners and more important tasks. Until recently many researchers have focused on the performance gains achieved through checkpoint...

2004
L. M. Silva J. G. Silva

Checkpointing and rollback recovery is a very effective technique to tolerate the occurrence of failures. Usually, checkpoint data is saved on disk, however, in some situations the time to write the data to disk can represent a considerable performance overhead. Alternative solutions would make use of main memory to maintain the checkpoint data. The paper starts by presenting two main memory ch...

Journal: :Theoretical Computer Science 2003

1996
D. Manivannan Mukesh Singhal

Checkpointing algorithms are classiied as synchronous and asynchronous in the literature. In synchronous checkpointing, processes synchronize their checkpointing activities so that a globally consistent set of checkpoints is always maintained in the system. Synchronizing checkpointing activity involves message overhead and process execution may have to be suspended during the checkpointing coor...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید