Rejuvenation and Failure Detection in Partitionable Systems

نویسندگان

Christof Fetzer

Karin Högstedt

چکیده

Certain gateways (e.g., some cable or DSL modems) are known to have low reliability and low availability. Most failures of these devices can however be “fixed” by rejuvenating the device after a failure has been detected. Such a detection based rejuvenation strategy permits increasing the availability of these gateways. In the considered scenario, rejuvenation is non-trivial since a failure of such a gateway will leave it partitioned away from the network. In particular, network operators that want to rejuvenate these gateways are in a different network partition, and can therefore not initiate a remote rejuvenation. In this paper we propose a failure detection based rejuvenation service and a remote detection service. The rejuvenation service detects and fixes “soft” failures automatically (in one partition), and the detection service detects (in another partition) all rejuvenations exactly once, within a bounded amount of time, even when the gateway is rejuvenated consecutively. The detection service also allows the detection of “hard” failures, and filtering of notifications of soft failures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Rejuvenation Scheduling of Distributed Computation Based on Dynamic Programming

Recently, a complementary approach to handle transient software failures, called software rejuvenation, is becoming popular as a proactive fault management technique in operational software systems. In this study, we develop the optimal scheduling algorithms to trigger software rejuvenation in distributed computation circumstance. In particular, we focus on two different computation circumstanc...

متن کامل

Implementing Diamond P with Bounded Messages on a Network of ADD Channels

We present an implementation of the eventually perfect failure detector (♦P ) from the original hierarchy of the Chandra-Toueg [4] oracles on an arbitrary partitionable network composed of unreliable channels that can lose and reorder messages. Prior implementations of ♦P have assumed different partially synchronous models ranging from bounded point-to-point message delay and reliable communica...

متن کامل

How the Time-Before-Failure Reacts to Periodic Rejuvenation

Rebooting is one of the commonly used approaches to recover from undesired crash or performance degradation in software systems. Recently, however, planned and periodic restart or rejuvenation has been proposed as a reliability management tool for avoiding unwanted failure of long-running systems. This paper presents an interesting observation that periodic rejuvenation alters the lifetime dist...

متن کامل

Self-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches

Composition, change and complexity have attracted ev- eryone’s attention towards Self-Adaptive systems. These systems, inspired by the human body, are capable of adapting to changes in the inner and outer environment. The main objective of this study is to achieve a more con- venient availability for e-banking services in the payment switch, using self-healing systems and focusing on the failur...

متن کامل

Quiescent Reliable Communication and Quiescent Consensus in Partitionable Networks

We consider partitionable networks with process crashes and lossy links, and focus on the problems of reliable communication and consensus for such networks. For both problems we seek algorithms that are quiescent, i.e., algorithms that eventually stop sending messages. We first tackle the problem of reliable communication for partitionable networks by extending the results of [ACT97a]. In part...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Rejuvenation and Failure Detection in Partitionable Systems

نویسندگان

چکیده

منابع مشابه

Optimal Rejuvenation Scheduling of Distributed Computation Based on Dynamic Programming

Implementing Diamond P with Bounded Messages on a Network of ADD Channels

How the Time-Before-Failure Reacts to Periodic Rejuvenation

Self-healing in payment switches with a focus on failure detection using State Ma- chine-based approaches

Quiescent Reliable Communication and Quiescent Consensus in Partitionable Networks

عنوان ژورنال:

اشتراک گذاری