Fault-tolerant Clock Synchronization for Distributed Systems with High Message Delay Variation
نویسندگان
چکیده
Fault-tolerant clock synchronization is an important requirement in many distributed systems , especially in time-critical and safety-critical applications. Frequently, interactive convergence algorithms are used for fault-tolerant clock synchronization, providing advantages such as fully distributed operation, low message exchange overhead, and simplicity of implementation. This paper presents the measured performance of three interactive convergence clock synchronization algorithms. Our experiments were conducted in a distributed UNIX environment featuring high message delay variation, which poses severe constraints on the clock synchronization tightness that may be achieved. The algorithms that were tested are: FTMA (fault-tolerant midpoint algorithm) 1], AEFTMA (adaptive exponential averaging fault-tolerant midpoint algorithm) 2], and SWA (sliding window algorithm) 3]. Our experimental results indicate that SWA outperforms the other algorithms in this environment, being able to achieve tighter synchronization under diierent simulated fault conditions. The superiority of SWA can be attributed to its high degree of fault tolerance, combined with its ability to treat messages with much longer than expected delays as faults. In distributed systems, computers cooperate to provide the expected functionality to a given application. Some tasks that are often found in such systems are: synchronizing activities that occur at diierent points of the system, ordering events in time, enforcing deadlines, and measuring elapsed time. A system with one or more of these requirements must use proper synchronization mechanisms to establish an agreed-upon global time scale among its components. Particularly in the case of safety-critical applications, synchrony must be maintained in spite of the presence of faults in the system. Frequently, fault-tolerant clock synchronization is achieved via interactive convergence algorithms in which nodes exchange their clock values and determine clock correction terms at regular intervals. This paper presents the measured performance of three interactive convergence algorithms: the sliding window algorithm (SWA) 3], the fault-tolerant midpoint algorithm (FTMA) 1], and the adaptive exponential averaging fault-tolerant midpoint algorithm (AEFTMA) 2]. The measurements were carried out with an application-level implementation running in a distributed UNIX environment 2]. This environment poses some practical constraints to clock synchronization, in particular a high variation in the
منابع مشابه
Fault-Tolerant Clock Synchronization in Environments with High Message Delay Variation
| Fault-tolerant clock synchronization is an important requirement in many distributed systems, especially in time-critical and safety-critical applications. Frequently, interactive convergence algorithms are used for fault-tolerant clock synchronization, providing advantages such as fully-distributed operation, low message exchange overhead and simplicity of implementation. This paper presents...
متن کاملA Model for Distributed Computing in Real - Time Systems
This work introduces a fault-tolerant real-time distributed computing model for messagepassing systems, which reconciles the distributed computing and the real-time systems perspective: By just replacing instantaneous computing steps with computing steps of non-zero duration, we obtain a model that both facilitates real-time schedulability analysis and retains compatibility with classic distrib...
متن کاملSoftware-based Fault-tolerant Clock Synchronization for Distributed Unix Environments Software-based Fault-tolerant Clock Synchronization for Distributed Unix Environments
| Fault-tolerant clock synchronization is often used in distributed systems with requirements such as close interaction between its components, measurements of elapsed time and ordering of events in the system. Diierent implementation approaches can be used to achieve fault-tolerant clock synchronization, depending on criteria such as performance, cost and availability of hardware and operating...
متن کاملAccuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms
Fault-tolerant distributed algorithms are a vital part of mission-critical distributed systems. In principle, automatic verification can be used to ensure the absence of bugs in such algorithms. In practice however, model checking tools will only establish the correctness of distributed algorithms if message passing is encoded efficiently. In this paper, we consider abstractions suitable for ma...
متن کاملA Case Study of Clock Synchronization in Flexray
This paper presents a case study on the performance of a distributed clock synchronization algorithm used in Flexray, a communication protocol designed to meet the requirements of dependable, fault-tolerant real-time applications. The Flexray industry consortium drives forward the standardization of a fault-tolerant communication system for advanced automotive applications. In this case study w...
متن کامل