Optimistic Fault Tolerance
ثبت نشده
چکیده
Traditionally, fault tolerance has been considered a pivotal component of system design. However, as modern commodity hardware becomes increasingly reliable, building dedicated fault tolerance systems becomes unnecessary. Therefore, while in this paper we propose nothing novel and make no technical contributions, we nevertheless identify a unique approach to fault tolerance, which we dub optimistic fault tolerance. We implement optimistic fault tolerance, or OFT, as a framework implemented as a C library which supports seamless interoperability with any large scale system. We provide a survey of real systems that demonstrates the rarity of system failure, and additionally show through benchmarks and simulation that across numerous workloads, our framework provides tolerance in the face of rare arbitrary failures.
منابع مشابه
Semantics of Optimistic Computation
We address the issue of deriving a semantically equivalent optimistic computation from a pessimistic computation by application-independent transformations. Computations are modeled by program dependence graphs (pdgs). The semantics of a computation is de-ned by a mapping from an initial state to a nal state, and is realized by a graph rewriting system. Semantics-preserving transformations are ...
متن کاملSchrödinger ’ s CRCs ( Fast
I revisit the fault-tolerance of cyclic redundancy checks (CRCs), expanding on the work of Driscoll et al [1]. I introduce the concepts of Schrödinger-Hamming weight and Schrödinger-Hamming distance, and I argue that under a fault model in which stuck-at-one-half or slightly-out-of-spec faults dominate, current methods for computing the fault detection of CRCs may be over-optimistic. Keywords-c...
متن کاملDesign Issues for Optimistic Distributed Discrete Event Simulation
Simulation is a powerful tool for studying the dynamics of a system. However, simulation is time-consuming. Thus, it is natural to attempt to use multiple processors to speed up the simulation process. Many protocols have been proposed to perform discrete event simulation in multi-processor environments. Most of these distributed discrete event simulation protocols are either conservative or op...
متن کاملA Cost-Effective and Flexible Scheme for Software Fault Tolerance
A new software fault tolerance scheme, called the Self-Configuring Optimistic Programming scheme, (SCOP), is proposed. It attempts to reduce the cost of fault tolerant software and to eliminate some inflexibilities and rigidities present in the existing software fault tolerance schemes. For obtaining these goals, it is structured in phases in order to produce acceptable results with the minimum...
متن کاملAn Improved Optimistic and Fault-Tolerant Replication Protocol
In this paper, a protocol is proposed that provides the advantages of lazy approaches, forestalling their traditionally found disadvantages. Thus, our approach reduces the abortion rates, and improves the performance of the system. It can also use a dynamic computation of the protocol threshold, approximating its results to the optimal ones. In addition, fault tolerance has been included in the...
متن کامل