Reliability analysis of clustered computing systems
نویسنده
چکیده
Clustered computing systems, using commercially available computers networked in a loosely-coupled fashion, can provide high levels of reliability if appropriate levels of error detection and recovery software are implemented in the middleware and application layers. In this paper we present a modeling approach for analyzing the hardware and software reliability of clustered computing systems. The clustered system is modeled as an irreducible Markov chain with working and failed states, and intermediate recovery states. The failure and recovery behavior is characterized in terms of the frequency and duration of fault recoveries and outages for a single processor in the cluster and for the entire clustered system. We apply the model to a telecommunication switching system application that uses the Lucent Technologies Reliable Clustered Computing product. The model results are presented for a range of values of the processor failure rate and the fault recovery coverage factor.
منابع مشابه
Investigation on Reliability Estimation of Loosely Coupled Software as a Service Execution Using Clustered and Non-Clustered Web Server
Evaluating the reliability of loosely coupled Software as a Service through the paradigm of a cluster-based and non-cluster-based web server is considered to be an important attribute for the service delivery and execution. We proposed a novel method for measuring the reliability of Software as a Service execution through load testing. The fault count of the model against the stresses of users ...
متن کاملReliability Analysis of Rock Wedge Stability: Knowledge-Based Clustered Partitioning Approach
In this paper a knowledge-based clustered partitioning technique is developed for determining reliability index and failure probability of rock wedge. Here, the a reliability index is analyzed and the optimization is carried out using a knowledge-based clustered partitioning (KCP) technique. The reliability index computed with this KCP technique is compared with those using other approaches suc...
متن کاملAction Models: A Reliability Modeling Formalism for Fault-Tolerant Distributed Computing Systems
Modern-day computing system design and development is characterized by increasing system complexity and ever shortening time to market. For modeling techniques to be deployed successfully, they must conveniently deal with complex system models, and must be quick and easy to use by non-specialists. In this paper we introduce “action models,” a modeling formalism that tries to achieve the above g...
متن کاملGreen Energy-aware task scheduling using the DVFS technique in Cloud Computing
Nowdays, energy consumption as a critical issue in distributed computing systems with high performance has become so green computing tries to energy consumption, carbon footprint and CO2 emissions in high performance computing systems (HPCs) such as clusters, Grid and Cloud that a large number of parallel. Reducing energy consumption for high end computing can bring various benefits such as red...
متن کاملParity Redundancy Strategies in a Large Scale Distributed Storage System
With the deployment of larger and larger distributed storage systems, data reliability becomes more and more of a concern. In particular, redundancy techniques that may have been appropriate in small-scale storage systems and disk arrays may not be sufficient when applied to larger scale systems. We propose a new mechanism called delayed parity generation with active data replication (DPGADR) t...
متن کامل