نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092  

2005
Thomas Huining Feng

In this project, incremental checkpointing is developed specifically for Java programs. This checkpointing scheme has a flavor of source code refactoring, which performs almost all the (rule-based) transformation automatically, requiring few (or no in many cases) interaction with the programmer. Incremental checkpointing bases on a logging technique that records the change in states instead of ...

1998
James S. Plank Michael G. Thomason

Performance prediction of checkpointing systems in the presence of failures is a well-studied research area. This paper makes three small contributions to this research area. First, we show how to apply the concept of availability from reliability theory as a useful metric for checkpointing systems. Second, we study the average availability of uniprocessor checkpointing systems, using the libck...

2008
Ch. D. V. Subba Rao M. M. Naidu

⎯ Checkpointing schemes facilitate fault recovery in distributed systems. The two-level fault recovery scheme of distributed system inherits the merits of both disk-based and diskless checkpointing schemes. The present work extends James S Plank’s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ to checkpoint programs with high locality of reference. This mechanism enables ap...

2010
Subba Rao Sai Krishna

Checkpointing and message logging are the popular and generalpurpose tools for providing fault tolerance in distributed systems. Diskless checkpointing schemes enable frequent checkpointing without a performance penalty. The present work extends James S Plank‟s Diskless checkpointing scheme (N+1 Parity) by introducing ‘Timeout’ mechanism to checkpoint programs with high locality of reference. T...

Journal: :IEEE Trans. Computers 2001
Yibei Ling Jie Mi Xiaola Lin

ÐCheckpointing is an effective fault-tolerant technique for improving system availability and reliability. However, a blind checkpointing placement can result in either performance degradation or expensive recovery cost. By means of the calculus of variations, we derive an explicit formula that links the optimal checkpointing frequency with a general failure rate, with the objective of globally...

2012
Vaithiyanathan Sundaram

Improving fault tolerance within the clusters has become vital because of the drastic decrease in the Mean Time Between Failures (MTBF) in complex clusters. Checkpointing is one of the robust ways to improve fault tolerance by rolling back from a saved state in the event of a failure in the cluster. Since, checkpointing primarily relies on storage devices for storing the states at regular inter...

Journal: :Journal of Parallel and Distributed Computing 2014

2015
Guillaume Aupy Yves Robert

In this chapter, we present scheduling algorithms to cope with faults on large-scale parallel platforms. We study checkpointing and show how to derive the optimal checkpointing period. Then we explain how to combine checkpointing with fault prediction, and discuss how the optimal period is modified when this combination is used. And finally we follow the very same approach for the combination o...

2003
Aurelien Bouteiller Pierre Lemarinier Géraud Krawezik Franck Cappello

MPI is one of the most adopted programming models for Large Clusters and Grid deployments. However, these systems often suffer from network or node failures. This raises the issue of selecting a fault tolerance approach for MPI. Automatic and transparent ones are based on either coordinated checkpointing or message logging associated with uncoordinated checkpoint. They are many protocols, imple...

2010
Zizhong Chen

Checkpointing is a typical approach to tolerate failures in today’s supercomputing clusters and computational grids. Checkpoint data can be saved either in central stable storage, or in processor memory (as in diskless checkpointing), or local disk space (replacing memory with local disk in diskless checkpointing). But where to save the checkpoint data has a great impact on the performance of a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید