Fault Tolerance in Multi-Core Systems
نویسنده
چکیده
Modern processors provide multiple cores for parallel computing. This paper describes how parallel processing on multiple cores can provide efficient fault tolerance. In general, multi-core processors provide structural redundancy which can be exploited for efficient replication. Furthermore, hardware features can improve performance of redundant execution by exchanging information between replicas. However, multicore performance can also be exploited for fault toperance without dedicated hardware support. This seminar paper introduces four fault-tolerance approaches, which each exploit multi-core processors. Furthermore, a comparison between the techniques shows both advantages and disadvantages for each technique.
منابع مشابه
A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems
Performance and time to market requirements cause many realtime designers to consider components, off the shelf (COTS) for real-time systems. Massive multi-core embedded processors with network-on-chip (NoC) designs to facilitate core-to-core communication are becoming common in COTS. These architectures benefit real-time scheduling, but they also pose predictability challenges. In this work, w...
متن کاملAutomating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches
BACKGROUND Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively...
متن کاملDesign and Analysis of Transient Fault Tolerance for Multi Core Architecture
This paper describes the software approach of fault tolerance for shared memory multi core system using PLR.PLR uses a software-centric approach transient fault tolerance which ensuring a correct software execution. This scheme is used at user space level which does not necessitate changes to the original application.PLR create a set of redundant process per application process. In this scheme ...
متن کاملDistributed Real-Time Fault Tolerance on a Virtualized Multi-Core System
This paper presents different approaches for real-time fault tolerance using redundancy methods for multi-core systems. Using hardware virtualization, a distributed system on a chip is created, where the cores are isolated from one another except through explicit communication channels. Using this system architecture, redundant tasks that would typically be run on separate processors can be con...
متن کامل