Fault Tolerance in Multi-Core Systems

نویسنده

  • Stefan Reif
چکیده

Modern processors provide multiple cores for parallel computing. This paper describes how parallel processing on multiple cores can provide efficient fault tolerance. In general, multi-core processors provide structural redundancy which can be exploited for efficient replication. Furthermore, hardware features can improve performance of redundant execution by exchanging information between replicas. However, multicore performance can also be exploited for fault toperance without dedicated hardware support. This seminar paper introduces four fault-tolerance approaches, which each exploit multi-core processors. Furthermore, a comparison between the techniques shows both advantages and disadvantages for each technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fault Observant Real-Time Embedded Design for Network-on-Chip Control Systems

Performance and time to market requirements cause many realtime designers to consider components, off the shelf (COTS) for real-time systems. Massive multi-core embedded processors with network-on-chip (NoC) designs to facilitate core-to-core communication are becoming common in COTS. These architectures benefit real-time scheduling, but they also pose predictability challenges. In this work, w...

متن کامل

Automating Fault Tolerance in High-Performance Computational Biological Jobs Using Multi-Agent Approaches

BACKGROUND Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost on the time taken for reinstating the job and the risk of losing data and execution accomplished by the job before it failed. Approaches which can proactively...

متن کامل

Design and Analysis of Transient Fault Tolerance for Multi Core Architecture

This paper describes the software approach of fault tolerance for shared memory multi core system using PLR.PLR uses a software-centric approach transient fault tolerance which ensuring a correct software execution. This scheme is used at user space level which does not necessitate changes to the original application.PLR create a set of redundant process per application process. In this scheme ...

متن کامل

Distributed Real-Time Fault Tolerance on a Virtualized Multi-Core System

This paper presents different approaches for real-time fault tolerance using redundancy methods for multi-core systems. Using hardware virtualization, a distributed system on a chip is created, where the cores are isolated from one another except through explicit communication channels. Using this system architecture, redundant tasks that would typically be run on separate processors can be con...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015