نتایج جستجو برای: fault recovery
تعداد نتایج: 262091 فیلتر نتایج به سال:
A distributed fault tolerant system for real time process control based on an enhancement of the distributed recovery block is described. Coverage is provided for failures in hardware, system software, networks, and application software. Fault tolerance provisions are introduced at the system level and in application software using an architecture based on the distributed recovery block (DRB). ...
The increasing size and complexity of software systems makes it hard to prevent orremove all possible faults. Faults that remain in the system can eventually lead toa system failure. Fault tolerance techniques are introduced for enabling systems torecover and continue operation when they are subject to faults. Many fault tolerancetechniques are available but incorporating them i...
This paper presents the architecture of a Wavelength Division Multiplexing (WDM) optical packet network, called WONDER, that was designed and prototyped in the PhotonLab at Politecnico di Torino, Italy. The design and implementation of the WONDER network aim to assess the effectiveness of optical technologies with respect to electronic ones, trying to identify an optimal mix of the two technolo...
With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault tolerant; most are in need for a seamless recovery framework. Among the automatic fault tolerant techniques proposed for MPI, message logging is preferable for its scalable recovery. The major challenge for message logging protocols i...
This paper discusses the modeling and analysis of three major fault-tolerant software system architec-tures: DRB (Distributed Recovery Blocks), NVP (N-Version Programming) and NSCP (N Self-Checking Programming). In the system-level reliability modeling domain, fault tree analysis techniques and Markov reward modeling techniques are combined to incorporate transient and permanent hardware faults...
We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinati...
This paper presents a solution for the problem of transparent recovery of asynchronous distributed computation on clusters of workstations when a fault occurs on a node. If the system has fault-tolerant features, it can survive the fault and continues its computations. Performance degradation is unavoidable when hardware redundancies are not available. It is a large advantage if the long-runtim...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید