Efficient Checkpointing over Local Area Networks*
نویسندگان
چکیده
Parallel and distributed computing on clusters of workstations is becoming very popular as i t provides a cost effective way for high performance computing. In these systems, the bandwidth of the communication subsystem (Using Ethernet technology) is about an order of magnitude smaller compared to the bandwidth of the storage subsystem. Hence, storing a state in a checkpoint is much more eficient than comparing states over the network. In this paper we present a novel checkpointing approach that enables eficient performance over local area networks. The main idea is that we use two types of checkpoints: comparecheckpoints (comparing the states of the redundant processes to detect faults) and storecheckpoints (where the state is only stored). The store-checkpoints d u c e the rollback needed after a fault is detected, without performing many unnecessary comparisons. As a particular example of this approach we analyzed the DMR checkpointing scheme with store-checkpoints. Our main result i s that the overhead of the execution tame can be significantly reduced when store-checkpoints are introduced. W e have implemented a prototype of the new DMR scheme and run it on workstations connected by a LAN. The experimental results we obtained match the analytical results and show that in some cases the overhead of the DMR checkpointing schemes over LAN's can be improved by as much
منابع مشابه
An Efficient Time-Based Checkpointing Protocol for Mobile Computing Systems over Mobile IP
Time-based coordinated checkpointing protocols are well suited for mobile computing systems because no explicit coordination message is needed while the advantages of coordinated checkpointing are kept. However, without coordination, every process has to take a checkpoint during a checkpointing process. In this paper, an efficient time-based coordinated checkpointing protocol for mobile computi...
متن کاملPerformance Improvement of Expanded Integrated Local Area Networks (RESEARCH NOTE)
In Local Area Networks (LAN) connected together by bridges, flow control and smooth traffic in the network is very important. However, congestion at bridges can cause intensive loss of received frames. In addition, the received frames are thrown away and have to be retransmitted by the source station, which causes more congestion and massive reduction in the overall network throughput. The netw...
متن کاملA new SDN-based framework for wireless local area networks
Nowadays wireless networks are becoming important in personal and public communication andgrowing very rapidly. Similarly, Software Dened Network (SDN) is an emerging approach to over-come challenges of traditional networks. In this paper, a new SDN-based framework is proposedto ne-grained control of 802.11 Wireless LANs. This work describes the benets of programmableAcc...
متن کاملAn Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment
Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...
متن کاملAn Analysis of Checkpointing Algorithms for Distributed Mobile Systems
Distributed snapshots are an important building block for distributed systems, and are useful for constructing efficient checkpointing protocols, among other uses. Direct application of these algorithms to mobile systems is not feasible, however, due to differences in the environment in which mobile systems operate, relative to general distributed systems. The mobile computing environment intro...
متن کامل