نتایج جستجو برای: coordinated checkpointing
تعداد نتایج: 48092 فیلتر نتایج به سال:
This paper examines comprehensive evaluation of aperiodic time-based checkpointing and rejuvenation schemes maximizing the steady-state system availability in an operational software system. We consider two kinds of maintenance policies: checkpointing prior to rejuvenating (CPTR) and rejuvenating prior to checkpointing (RPTC). These schemes are complementary from each other to schedule checkpoi...
Checkpointing and closed nesting are mechanisms typically used for implementing partial roll-back in transactional systems. Closed nesting limits the amount of work to redo on an abort by allowing sub-transactions to abort and retry independently from their parents. Checkpointing goes further and allows a transaction to be rolled back to any previous point where a checkpoint was saved. Checkpoi...
We design and implement a high availability parallel run-time system---ChaRM64, a Checkpointbased Rollback Recovery and Migration system for parallel running programs on a cluster of IA-64 computers. At first, we discuss our solution of a user-level, single process checkpoint/recovery library running on IA-64 systems. Based on this library, ChaRM64 is realized, which implements a user-transpare...
Intel’s chip design run in a large-scale globally distributed environment with 600,000 cores. In the current semiconductor market scenario, a combination of factors such as time to market pressure, explosive growth in the mobile market segment and upcoming new markets has led to a significant increase in the demand for and reliability of computing resources. Checkpointing is a capability that c...
Applications that generate bursty I/O load, like checkpointing, require additional support to perform efficiently on next generation petascale supercomputers. Tens of thousands of processors, generating terabytes of snapshot data at once at each timestep, can easily overwhelm a storage system. Further, even at the current peak I/O bandwidth rates, offered by parallel file system deployments at ...
As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures, enabling applications to periodically save their state and restart computation after a failure. Although a variety of automated system-level ...
Exascale computers are predicted to emerge by the end of this decade with millions of nodes and billions of concurrent cores/threads. One of the most critical challenges for exascale computing is how to effectively and efficiently maintain the system reliability. Checkpointing is the state-of-theart technique for high-end computing system reliability that has proved to work well for current pet...
1 Disaster Survival Guide in Petascale Computing: An Algorithmic Approach 3 Jack J. Dongarra, Zizhong Chen, George Bosilca, and Julien Langou 1.1 FT-MPI: A fault tolerant MPI implementation . . . . . . . . 6 1.1.1 FT-MPI Overview . . . . . . . . . . . . . . . . . . . . 6 1.1.2 FT-MPI: A Fault Tolerant MPI Implementation . . . 6 1.1.3 FT-MPI Usage . . . . . . . . . . . . . . . . . . . . . . 7 1....
Today, the scale of High performance computing (HPC) systems is much larger than ever. This brings a challenge to fault tolerance of HPC systems. MPI (Message Passing Interface) is one of the most important programming tools for HPC. There are quite a few fault-tolerant extensions for MPI, such as MPICH-V, StarFish, FT-MPI and so on. Most of them are based on on-disk checkpointing. In this pape...
Checkpointing is a basic mechanism for backward error-recovery in fault-tolerant systems. A checkpointed process stops execution and saves its states to files periodically. To reduce the file sizes, only data modified between two consecutive checkpoint times is saved. However, existing approaches do not consider operating system paging activities; which, if ignored may double the number of disk...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید