Proactive management of software aging

نویسندگان

  • Vittorio Castelli
  • Richard E. Harper
  • Philip Heidelberger
  • Steven W. Hunter
  • Kishor S. Trivedi
  • Kalyanaraman Vaidyanathan
  • William P. Zeggert
چکیده

Software failures are now known to be a dominant source of system outages. Several studies and much anecdotal evidence point to “software aging” as a common phenomenon, in which the state of a software system degrades with time. Exhaustion of system resources, data corruption, and numerical error accumulation are the primary symptoms of this degradation, which may eventually lead to performance degradation of the software, crash/hang failure, or other undesirable effects. “Software rejuvenation” is a proactive technique intended to reduce the probability of future unplanned outages due to aging. The basic idea is to pause or halt the running software, refresh its internal state, and resume or restart it. Software rejuvenation can be performed by relying on a variety of indicators of aging, or on the time elapsed since the last rejuvenation. In response to the strong desire of customers to be provided with advance notice of unplanned outages, our group has developed techniques that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reaches a critical level, and automatically perform proactive software rejuvenation of an application, process group, or entire operating system, depending on the pervasiveness of the resource exhaustion and our ability to pinpoint the source. This technology has been incorporated into the IBM Director for xSeries servers. To quantitatively evaluate the impact of different rejuvenation policies on the availability of cluster systems, we have developed analytical models based on stochastic reward nets (SRNs). For timebased rejuvenation policies, we determined the optimal rejuvenation interval based on system availability and cost. We also analyzed a rejuvenation policy based on prediction, and showed that it can further increase system availability and reduce downtime cost. These models are very general and can capture a multitude of cluster system characteristics, failure behavior, and performability measures, which we are just beginning to explore.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling and Analysis of Software Aging and Rejuvenation

Software systems are known to suffer from outages due to transient errors. Recently, the phenomenon of “software aging”, one in which the state of the software system degrades with time, has been reported. To counteract this phenomenon,a proactive approach of fault management, called “software rejuvenation”, has been proposed. This essentially involves gracefully terminating an application or a...

متن کامل

Investigating the mediating role of work-family conflict in the relationship between proactive personality trait and marital satisfaction

Marital satisfaction plays an important role in the couple's life expectancy. It has been found that this construct is related to the occupational and personality characteristics of the couples. Accordingly, in this study, the mediating role of work-family conflict in the relationship between proactive personality and marital satisfaction has been assessed. The sample consisted of 142 married m...

متن کامل

Robust and Adaptive Modeling of Software Aging

1. Introduction The widespread phenomenon of software (image) aging is known to cause performance degradation, transient failures or even crashes of applications. This undesired behavior is especially visible in long-running software such as web and application servers and enterprise always-on applications-software deployed frequently in Grid and utility computing environments. The management c...

متن کامل

A Comprehensive Approach to Software Aging and Rejuvenation on a Single Node Software System

The phenomenon of software aging is dominant in modern software systems, affecting their behavior and leading to major and minor failures, which hamper their overall performance. The effects of software aging on software systems are associated with major failures in the recent past, encouraging scientists to work towards proposing vital solutions to the problem. A preventive and proactive solut...

متن کامل

Refined non-homogeneous markovian models for a single-server type of software system with rejuvenation

Long running software systems are known to experience an aging phenomenon called software aging, one in which the accumulation of errors during the execution of software leads to performance degradation and eventually results in failure. To counteract this phenomenon a proactive fault management approach, called software rejuvenation, is particularly useful. It essentially involves gracefully t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 45  شماره 

صفحات  -

تاریخ انتشار 2001