coordinated checkpointing

نتایج جستجو برای: coordinated checkpointing

تعداد نتایج: 48092 فیلتر نتایج به سال:

A Study of Mutable Checkpointing Approach to Reduce the Overheads Associated with Coordinated Checkpointing

Journal: :The SIJ Transactions on Computer Networks & Communication Engineering 2013

متن کامل

Defining the Checkpoint Interval for Uncoordinated Checkpointing Protocols

2011

Leonardo Fialho Dolores Rexachs

Parallel applications running on large computers suffer from the absence of a reliable environment. Fault tolerance proposals, in general, rely on rollback-recovery strategies supported by checkpoint and/or message logging. There are well-defined models that address the optimum checkpoint interval for coordinated checkpointing. Nevertheless, there is a lack of models concerning uncoordinated ch...

متن کامل

A novel min-process checkpointing scheme for mobile computing systems

Journal: :Journal of Systems Architecture 2005

Guohui Li Hongya Wang

In distributed computing systems, processes in different hosts take checkpoints to survive failures. For mobile computing systems, due to certain new characteristics such as mobility, low bandwidth, disconnection, low power consumption and limited memory, conventional distributed checkpointing schemes need to be reconsidered. In this paper, a novel min-process coordinated checkpointing algorith...

متن کامل

Adaptive Two-Level Blocking Coordinated Checkpointing for High Performance Cluster Computing Systems

Journal: :J. Inf. Sci. Eng. 2010

Mehdi Lotfi Seyed Ahmad Motamedi

Blocking coordinated checkpointing is a well-known method for achieving fault tolerance in cluster computing systems. In this work, we introduce a new approach for blocking coordinated checkpointing using two-level checkpointing. The first level of checkpointing is local checkpointing, and computing nodes save the checkpoints in local disk. If a transient failure occurs in the computing node, t...

متن کامل

Coordinated Checkpointing-Rollback Error Recovery for Distributed Shared Memory Multicomputers

1994

G. Janakiraman Yuval Tamir

Most recovery schemes that have been proposed for Distributed Shared Memory (DSM) systems require unnecessarily high checkpointing frequency and checkpoint traffic, which are sensitive to the frequency of interprocess communication in the applications. For message-passing systems, low overhead error recovery based on coordinated checkpointing allows the frequency of checkpointing to be determin...

متن کامل

Accelerating incremental checkpointing for extreme-scale computing

Journal: :Future Generation Comp. Syst. 2014

Kurt B. Ferreira Rolf Riesen Patrick G. Bridges Dorian C. Arnold Ron Brightwell

Concern is beginning to grow in the high-performance computing (HPC) community regarding the reliability of future large-scale systems. Disk-based coordinated checkpoint/restart has been the dominant fault tolerance mechanism in HPC systems for the last 30 years. Checkpoint performance is so fundamental to scalability that nearly all capability applications have custom checkpoint strategies to ...

متن کامل

System Progress Estimation in Time based Coordinated Checkpointing Protocols

Journal: :International Journal of Computer Applications 2012

متن کامل

Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI Protocols

Journal: :Future Generation Comp. Syst. 2008

Darius Buntinas Camille Coti Thomas Hérault Pierre Lemarinier Laurence Pilard Ala Rezmerita Eric Rodriguez Franck Cappello

A long-term trend in high-performance computing is the increasing number of nodes in parallel computing platforms, which entails a higher failure probability. Fault tolerant programming environments should be used to guarantee the safe execution of critical applications. Research in fault tolerant MPIs has led to the development of several fault tolerant MPI environments. Different approaches a...

متن کامل

On the Impossibility of Min-Process Non-Blocking Checkpointing and An Efficient Checkpointing Algorithm for Mobile Computing Systems

1998

Guohong Cao Mukesh Singhal

Mobile computing raises many new issues, such as lack of stable storage, low bandwidth of wireless channel, high mobility, and limited battery life. These new issues make traditional checkpointing algorithms unsuitable. Prakash and Singhal [14] proposed the first coordinated checkpointing algorithm for mobile computing systems. However, we showed that their algorithm may result in an inconsiste...

متن کامل

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Journal: Journal of Computer and Robotics 2013

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید