Somersault Software Fault-Tolerance

نویسندگان

  • Paul Murray
  • Roger Fleming
  • Paul Harry
  • Paul Vickers
چکیده

software fault-tolerance, process replication failure masking, continuous availability, topology The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and its properties, including: 1. Fault-tolerance — Somersault implements " process mirroring " within a group of processes called a recovery unit. Failure of individual group members is completely masked. 2. Abstraction — Somersault provides loss-less messaging between units. Recovery units and single processes are addressed uniformly as single entities. Recovery unit application code is unaware of replication. 3. High performance — The simple protocol provides throughput comparable to non-fault-tolerant processes at a low latency overhead. There is also sub-second failover time. 4. Compositionality — The same protocol is used to communicate between recovery units as between single processes, so any topology can be formed. 5. Scalability — Failure detection, failure recovery and general system performance are independent of the number of recovery units in a software system. Somersault has been developed at HP Laboratories. At the time of writing it is undergoing industrial trials. The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and its properties, including: • Fault-tolerance – Somersault implements " process mirroring " within a group of processes called a recovery unit. Failure of individual group members is completely masked. • Abstraction – Somersault provides loss-less messaging between units. Recovery units and single processes are addressed uniformly as single entities. Recovery unit application code is unaware of replication. • High performance – The simple protocol provides throughput comparable to non-fault-tolerant processes at a low latency overhead. There is also sub-second failover time. • Compositionality – the same protocol is used to communicate between recovery units as between single processes, so any topology can be formed. • Scalability – failure detection, failure recovery and general system performance are independent of the number of recovery units in a software system. Somersault has been developed at HP laboratories. At the time of writing it is undergoing industrial trials.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Somersault: Enabling Fault-Tolerant Distributed Software Systems

fault-tolerant, CORBA, process replication, process mirroring, high availability Somersault is a platform for developing distributed fault-tolerant software components and integrating these critical components with other components into distributed system solutions. Critical application processes are mirrored across a network, with each critical process being replicated in a primary and seconda...

متن کامل

An Overview of Software Fault Tolerant Computing

Fault-tolerance is the survival attribute of the real-time software system. The software should provide correct results in the face of various failures. A major technological concern for the coming years is the ever widening gap between the demand for high quality, robust software and its supply. Software fault tolerance is basically the design faults in the computer system. Its function is to ...

متن کامل

On the Extension of Xception to Support Software Fault Models

Software faults are recognized as the major cause of system outages. The two possible approaches to overcome this problem are fault avoidance and fault tolerance. Quality assurance techniques fail to attain the zero defects mark, making fault tolerance vital to assure mission and business critical systems dependability. One major issue is the difficulty in the verification and validation of sof...

متن کامل

Design and Analysis of Transient Fault Tolerance for Multi Core Architecture

This paper describes the software approach of fault tolerance for shared memory multi core system using PLR.PLR uses a software-centric approach transient fault tolerance which ensuring a correct software execution. This scheme is used at user space level which does not necessitate changes to the original application.PLR create a set of redundant process per application process. In this scheme ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998