Implementing Fault-tolerant Services Using the State Machine Approach: a Tutorial

نویسندگان

  • John H. Wensley
  • Lui Sha
  • John P. Lehoczky
  • John A. Stankovic
  • Ragunathan Rajkumar
چکیده

Computer systems are increasingly employed in circumstances wheretheir failure (or even their correct operation, if they are built to awedrequirements) can have serious consequences.There is a surprising diversity of opinion concerning the propertiesthat such \critical systems" should possess, and the best methods todevelop them. The dependability approach grew out of the tradition ofultra-reliable and fault-tolerant systems, while the safety approach grewout of the tradition of hazard analysis and system safety engineering.Yet another tradition is found in the security community, and there arefurther specialized approaches in the tradition of real-time systems. Inthis report, I examine the critical properties considered in each approach,and the techniques that have been developed to specify them and toensure their satisfaction.Since systems are now being constructed that must satisfy severalof these critical system properties simultaneously, there is particularinterest in the extent to which techniques from one tradition supportor con ict with those of another, and in whether certain critical sys-tem properties are fundamentally compatible or incompatible with eachother. As a step toward improved understanding of these issues, I suggesta taxonomy, based on Perrow's analysis1, that considers the complexityof component interactions and tightness of coupling as primary factors.1C. Perrow. Normal Accidents: Living with High Risk Technologies. Basic Books, New York,NY, 1984. Critical System Properties:Survey and Taxonomy1Original version published in Reliability Engineering and System Safety , Vol. 43,No. 2, pp. 189{219, 1994John RushbyComputer Science LaboratorySRI InternationalMenlo Park CA 94025 USATechnical Report CSL-93-01, May 1993Revised February 19941This work was supported by the National Aeronautics and Space Administration Lan-gley Research Center and the US Naval Research Laboratory under contract NAS1-18969and by the US Naval Research Laboratory under contract N00014-92-C-2177.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The State Machine Approach: A Tutorial

The state machine approach is a general method for achieving fault tolerance and implementing decentralized control in distributed systems. This paper reviews the approach and identifies abstractions needed for coordinating ensembles of state machines. Implementations of these abstractions for two different failure models Byzantine and fail-stolr--are discussed. The state machine approach is il...

متن کامل

Implementing Adaptive Fault-Tolerant Services for Hybrid Faults

The two major approaches to building fault-tolerant services are commonly known as the Primary-Backup approach (PB) and the State-Machine approach (SM). PB can tolerate crash and omission faults and runs more economically than SM, but SM can tolerate more serious faults, including arbitrary or Byzantine faults. Instead of selecting one or the other approach, thus either incurring a high running...

متن کامل

A Guided Tour on the Theory and Practice of State Machine Replication

This chapter presents the fundamentals and applications of the State Machine Replication (SMR) technique for implementing consistent fault-tolerant services. Our focus here is threefold. First we present some fundamentals about distributed computing and three “practical” SMR protocols for different fault models. Second, we discuss some recent work aiming to improve the performance, modularity a...

متن کامل

Fault tolerant system with imperfect coverage, reboot and server vacation

This study is concerned with the performance modeling of a fault tolerant system consisting of operating units supported by a combination of warm and cold spares. The on-line as well as warm standby units are subject to failures and are send for the repair to a repair facility having single repairman which is prone to failure. If the failed unit is not detected, the system enters into an unsafe...

متن کامل

Implementing Fault-Tolerant Services Using State Machines: Beyond Replication

This paper describes a method to implement fault-tolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines uses a combination of coding theory and replication to ensure efficiency as well as savings in storage and messages during normal operations. Fused state machines may incur higher overhead during recovery from crash or Byzantine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994