Optimal Primary-Backup Protocols
نویسندگان
چکیده
One way to implement a fault-tolerant service is to employ multiple sites that fail independently. The state of the service is replicated and distributed among these sites, and updates are coordinated so that even when a subset of the sites fail, the service remains available. A common approach to structuring such replicated services is to designate one site as the primary and all the others as backups. Clients make requests by sending messages only to the primary. If the primary fails, then a failover occurs and one of the backups takes over. This service architecture is commonly called the primary-backup or the primary-copy approach [1]. In [5] we give lower bounds for implementing primary-backup protocols under various models of failure. These lower bounds constrain the degree of replication, the t ime during which the service can be without a primary, and the amount of t ime it can take to respond to a client request. In this paper, we show that most of these lower bounds are tight by giving matching protocols. Some of the protocols that we describe have surprising properties. In one case, the optimal protocol is one in which a non-faulty pr imary is forced to relinquish control to a backup that it knows to be faulty! However, the existence of such a scenario is not peculiar to our protocol. As shown in [5], relinquishing control to a faulty backup is indeed necessary to achieve optimal protocols in some failure models. Another surprise is that in some protocols that achieve optimal response time, the site that receives the request (i.e. the primary) is not the site that sends the response to the clients. We show that this anomaly is not idiosyncratic to our protocols-i t is necessary for achieving optimal response time.
منابع مشابه
Weak Leader Election in the receive-omission failure model
ions that enables one to more expressively represent failure scenarios by considering failures that are not independent or not identically distributed; 4) although it allows for faulty processes to be elected, correct processes are able to detect it, enabling the use of alarms to indicate failures in the system. Relating to our discussion on Primary-Backup protocols, by assuming that faulty pro...
متن کاملA Driven Backup Routing Table to Find Alternative Dijoint Path in Ad Hoc Wireless
The performances of the routing protocols are important since they compute the primary path between source and destination. In addition, routing protocols need to detect failure within a short period of time when nodes move to start updating the routing table in order to find a new primary path to the destination. Meantime, loss of packets and end-toend delays will increase thereby reducing thr...
متن کاملEfficient Resource Management Mechanism with Fault Tolerant Model for Computational Grids
Grid computing provides a framework and deployment environment that enables resource sharing, accessing, aggregation and management. It allows resource and coordinated use of various resources in dynamic, distributed virtual organization. The grid scheduling is responsible for resource discovery, resource selection and job assignment over a decentralized heterogeneous system. In the existing sy...
متن کاملChapter 7: Replication Management using the State Machine Approach
Most distributed systems employ replicated services in one form or another. By replicating a service, we can support fault-tolerance as well as improving overall throughput by placing server replicas at sites where the service is needed. Protocols for replication management can be divided into two general classes. The first, called "the state machine approach" or "active replication", has no ce...
متن کاملFault Tolerance in Transaction Systems
We survey two schemes for fault tolerance for diierent fault models. The rst, primary-backup approach deals with disaster recovery. The second, is aimed at developing commit protocols that tolerate commission failures. A remote backup database system tracks the state of a primary system, taking over transaction processing when disaster hits the primary site. The primary and backup sites are phy...
متن کامل