coordinated checkpointing

نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092 فیلتر نتایج به سال:

Loosely coordinated coscheduling in the context of other approaches for dynamic job scheduling: a survey

Journal: :Concurrency and Computation: Practice and Experience 2005

Angela C. Sodan

Loosely coordinated (implicit/dynamic) coscheduling is a time-sharing approach that originates from network of workstations environments of mixed parallel/serial workloads and limited software support. It is meant to be an easy-to-implement and scalable approach. Considering that the percentage of clusters in parallel computing is increasing and easily portable software is needed, loosely coord...

متن کامل

Analysis of checkpointing for schedulability of real-time systems

1997

Sasikumar Punnekkat Alan Burns

Checkpointing is a relatively cost effective method for achieving fault tolerance in real-time systems. Since checkpointing schemes depend on time redundancy, they could affect the correctness of the system by causing deadlines to be missed. This paper provides exact schedulability tests for fault tolerant task sets under specified failure hypothesis and employing checkpointing to assist in fau...

متن کامل

Compiler-Enhanced Incremental Checkpointing

2007

Greg Bronevetsky Daniel Marques Keshav Pingali Radu Rugina

As modern supercomputing systems reach the peta-flop performance range, they grow in both size and complexity. This makes them increasingly vulnerable to failures from a variety of causes. Checkpointing is a popular technique for tolerating such failures in that it allows applications to periodically save their state and restart the computation after a failure. Although a variety of automated s...

متن کامل

A Secure Checkpointing Protocol for Survivable Server Design

2004

Vamsi Kambhampati Indrajit Ray Eunjong Kim

Secure checkpointing appears to be a useful technique for designing survivable systems. These are fault-tolerant systems that are robust against malicious security attacks. Secure checkpointing, however, is not easily done. Without adequate protection, the checkpointing process can be attacked and compromised. The checkpointing data can be subjected to malicious attacks and be a source of secur...

متن کامل

Asynchronous Two-level Checkpointing Scheme for Large-scale Adjoints in the Spectral-element Solver Nek5000

2016

Michel Schanen Oana Marin Hong Zhang Mihai Anitescu

Adjoints are an important computational tool for large-scale sensitivity evaluation, uncertainty quantification, and derivative-based optimization. An essential component of their performance is the storage/recomputation balance in which efficient checkpointing methods play a key role. We introduce a novel asynchronous two-level adjoint checkpointing scheme for multistep numerical time discreti...

متن کامل

Automatic Parallel Program Checkpointing in Message-Passing Environments

2007

Andrey Smirnov

Problem of efficient cluster resources usage is very important, because of high demand for parallel computations. Checkpointing allows to manage cluster computing time more efficiently. In this article parallel programs checkpointing problems are discussed and implementation of automatic parallel checkpointing systems for MPI programs is presented. It is based on simple user-space portable chec...

متن کامل

Architecture Support for Behavior-based Adaptive Checkpointing

Journal: :JSW 2008

Nianen Chen Shangping Ren

Checkpointing is a commonly used approach to provide system fault-tolerance. However, using a constant checkpointing frequency may compromise the system’s overall performance when there are multiple types of QoS requirements involved. Hence, it is important that the checkpointing frequency is customizable and runtime adaptable. However, for open distributed and embedded applications, often ther...

متن کامل

The performance of independent checkpointing in distributed systems

1995

Pierre Sens

This paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed system, Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a...

متن کامل

Application Level Fault Tolerance in Heterogenous Networks of Workstations

Journal: :J. Parallel Distrib. Comput. 1997

Adam Beguelin Erik Seligman Peter Stephan

We have explored methods for checkpointing and restarting processes within the Distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although transparent to the user, were rejected because they lack support for heterogeneity. We have implem...

متن کامل

Modular Checkpointing for Atomicity

Journal: :Electronic Notes in Theoretical Computer Science 2007

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید