Optimal Primary-Backup Protocols

نویسندگان

  • Navin Budhiraja
  • Keith Marzullo
  • Fred B. Schneider
  • Sam Toueg
چکیده

One way to implement a fault-tolerant service is to employ multiple sites that fail independently. The state of the service is replicated and distributed among these sites, and updates are coordinated so that even when a subset of the sites fail, the service remains available. A common approach to structuring such replicated services is to designate one site as the primary and all the others as backups. Clients make requests by sending messages only to the primary. If the primary fails, then a failover occurs and one of the backups takes over. This service architecture is commonly called the primary-backup or the primary-copy approach [1]. In [5] we give lower bounds for implementing primary-backup protocols under various models of failure. These lower bounds constrain the degree of replication, the t ime during which the service can be without a primary, and the amount of t ime it can take to respond to a client request. In this paper, we show that most of these lower bounds are tight by giving matching protocols. Some of the protocols that we describe have surprising properties. In one case, the optimal protocol is one in which a non-faulty pr imary is forced to relinquish control to a backup that it knows to be faulty! However, the existence of such a scenario is not peculiar to our protocol. As shown in [5], relinquishing control to a faulty backup is indeed necessary to achieve optimal protocols in some failure models. Another surprise is that in some protocols that achieve optimal response time, the site that receives the request (i.e. the primary) is not the site that sends the response to the clients. We show that this anomaly is not idiosyncratic to our protocols-i t is necessary for achieving optimal response time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weak Leader Election in the receive-omission failure model

ions that enables one to more expressively represent failure scenarios by considering failures that are not independent or not identically distributed; 4) although it allows for faulty processes to be elected, correct processes are able to detect it, enabling the use of alarms to indicate failures in the system. Relating to our discussion on Primary-Backup protocols, by assuming that faulty pro...

متن کامل

A Driven Backup Routing Table to Find Alternative Dijoint Path in Ad Hoc Wireless

The performances of the routing protocols are important since they compute the primary path between source and destination. In addition, routing protocols need to detect failure within a short period of time when nodes move to start updating the routing table in order to find a new primary path to the destination. Meantime, loss of packets and end-toend delays will increase thereby reducing thr...

متن کامل

Efficient Resource Management Mechanism with Fault Tolerant Model for Computational Grids

Grid computing provides a framework and deployment environment that enables resource sharing, accessing, aggregation and management. It allows resource and coordinated use of various resources in dynamic, distributed virtual organization. The grid scheduling is responsible for resource discovery, resource selection and job assignment over a decentralized heterogeneous system. In the existing sy...

متن کامل

Chapter 7: Replication Management using the State Machine Approach

Most distributed systems employ replicated services in one form or another. By replicating a service, we can support fault-tolerance as well as improving overall throughput by placing server replicas at sites where the service is needed. Protocols for replication management can be divided into two general classes. The first, called "the state machine approach" or "active replication", has no ce...

متن کامل

Fault Tolerance in Transaction Systems

We survey two schemes for fault tolerance for diierent fault models. The rst, primary-backup approach deals with disaster recovery. The second, is aimed at developing commit protocols that tolerate commission failures. A remote backup database system tracks the state of a primary system, taking over transaction processing when disaster hits the primary site. The primary and backup sites are phy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992