Implementing Fault-tolerant Services Using the State Machine Approach: a Tutorial
نویسندگان
چکیده
Computer systems are increasingly employed in circumstances wheretheir failure (or even their correct operation, if they are built to awedrequirements) can have serious consequences.There is a surprising diversity of opinion concerning the propertiesthat such \critical systems" should possess, and the best methods todevelop them. The dependability approach grew out of the tradition ofultra-reliable and fault-tolerant systems, while the safety approach grewout of the tradition of hazard analysis and system safety engineering.Yet another tradition is found in the security community, and there arefurther specialized approaches in the tradition of real-time systems. Inthis report, I examine the critical properties considered in each approach,and the techniques that have been developed to specify them and toensure their satisfaction.Since systems are now being constructed that must satisfy severalof these critical system properties simultaneously, there is particularinterest in the extent to which techniques from one tradition supportor con ict with those of another, and in whether certain critical sys-tem properties are fundamentally compatible or incompatible with eachother. As a step toward improved understanding of these issues, I suggesta taxonomy, based on Perrow's analysis1, that considers the complexityof component interactions and tightness of coupling as primary factors.1C. Perrow. Normal Accidents: Living with High Risk Technologies. Basic Books, New York,NY, 1984. Critical System Properties:Survey and Taxonomy1Original version published in Reliability Engineering and System Safety , Vol. 43,No. 2, pp. 189{219, 1994John RushbyComputer Science LaboratorySRI InternationalMenlo Park CA 94025 USATechnical Report CSL-93-01, May 1993Revised February 19941This work was supported by the National Aeronautics and Space Administration Lan-gley Research Center and the US Naval Research Laboratory under contract NAS1-18969and by the US Naval Research Laboratory under contract N00014-92-C-2177.
منابع مشابه
The State Machine Approach: A Tutorial
The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models Byzantine and fail-stolr--are discussed. The state machine approach is il...
متن کاملImplementing Adaptive Fault-Tolerant Services for Hybrid Faults
The two major approaches to building fault-tolerant services are commonly known as the Primary-Backup approach (PB) and the State-Machine approach (SM). PB can tolerate crash and omission faults and runs more economically than SM, but SM can tolerate more serious faults, including arbitrary or Byzantine faults. Instead of selecting one or the other approach, thus either incurring a high running...
متن کاملA Guided Tour on the Theory and Practice of State Machine Replication
This chapter presents the fundamentals and applications of the State Machine Replication (SMR) technique for implementing consistent fault-tolerant services. Our focus here is threefold. First we present some fundamentals about distributed computing and three “practical” SMR protocols for different fault models. Second, we discuss some recent work aiming to improve the performance, modularity a...
متن کاملFault tolerant system with imperfect coverage, reboot and server vacation
This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...
متن کاملImplementing Fault-Tolerant Services Using State Machines: Beyond Replication
This paper describes a method to implement fault-tolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines uses a combination of coding theory and replication to ensure efficiency as well as savings in storage and messages during normal operations. Fused state machines may incur higher overhead during recovery from crash or Byzantine ...
متن کامل