Fault Tolerance in a Multi-Layered DRE System: A Case Study
نویسندگان
چکیده
Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we encountered, some due to the fault tolerance requirements we needed to meet and others due to characteristics of the resource management software. The challenges include the need for extremely rapid recovery; supporting the characteristics of component middleware, including peer-topeer communication and multi-tiered calling semantics; supporting multiple languages; and the co-existence of replicated and non-replicated elements. Making our multilayer dynamic resource manager fault-tolerant required simultaneously overcoming all of these challenges, presenting a significant fault tolerance research challenge.
منابع مشابه
Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems
Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to ...
متن کاملModel - Driven Fault - Tolerance Provisioning for Component - Based Distributed Real - Time Embedded Systems
Developing distributed real-time and embedded (DRE) systems require effective strategies to simultaneously handle the challenges of networked systems, enterprise systems, and embedded systems. Component-based model is gaining prominence for the development of DRE systems because of its emphasis on composability, reuse, excellent support for separation of concerns, and explicit staging of develo...
متن کاملDesigning Fault tolerant Mission-Critical Middleware Infrastructure for Distributed Real-time and Embedded Systems?
Fault tolerance is a crucial design consideration for missioncritical distributed real-time and embedded (DRE) systems, such as avionics mission computing systems, and supervisory control and data acquisition systems. Increasingly more of these systems are created using emerging middleware standards, such as publish-subscribe communication services and component based architectures. Most previo...
متن کاملAdding Fault-Tolerance to a Hierarchical DRE System
Dynamic resource management is a crucial part of the infrastructure for emerging mission-critical distributed real-time embedded system. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes an ongoing effort to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we have encountered, includ...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JCP
دوره 1 شماره
صفحات -
تاریخ انتشار 2006