Design and Validation of Portable Communication Infrastructure for Fault-Tolerant Cluster Middleware

نویسندگان

  • Ming Li
  • Wenchao Tao
  • Daniel Goldberg
  • Israel Hsu
  • Yuval Tamir
چکیده

We describe the communication infrastructure (CI) for our fault-tolerant cluster middleware, which is optimized for two classes of communication: for the applications and for the cluster management middleware. This CI was designed for portability and for efficient operation on top of modern user-level message passing mechanisms. We present a functional fault model for the CI and show how platform-specific faults map to this fault model. Based on this fault model, we have developed a fault injection scheme that is integrated with the CI and is thus portable across different communication technologies. We have used fault injection to validate and evaluate the implementation of the CI itself as well as the cluster management middleware in the presence of communication faults.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

FEMPI: A Lightweight Fault-tolerant MPI for Embedded Cluster Systems

Ever-increasing demands of space missions for data returns from their limited processing and communications resources have made the traditional approach of data gathering, data compression, and data transmission no longer viable. Increasing on-board processing power by providing high-performance computing (HPC) capabilities using commercial-off-the-shelf (COTS) components is a promising approac...

متن کامل

Designing Fault tolerant Mission-Critical Middleware Infrastructure for Distributed Real-time and Embedded Systems?

Fault tolerance is a crucial design consideration for missioncritical distributed real-time and embedded (DRE) systems, such as avionics mission computing systems, and supervisory control and data acquisition systems. Increasingly more of these systems are created using emerging middleware standards, such as publish-subscribe communication services and component based architectures. Most previo...

متن کامل

A Middleware for Constructing Highly Available, Fault Tolerant, and Attack Tolerant Services

This paper describes the design of a middleware that provides support for constructing highly available, secure, fault-tolerant, and attack-tolerant services. The central component of this middleware is a group communication service that comprises of six network protocols: atomic broadcast, group membership, failure detection, attack detection, group access control, and secure intermember commu...

متن کامل

CAFT: Cost-aware and Fault-tolerant routing algorithm in 2D mesh Network-on-Chip

By increasing, the complexity of chips and the need to integrating more components into a chip has made network –on- chip known as an important infrastructure for network communications on the system, and is a good alternative to traditional ways and using the bus. By increasing the density of chips, the possibility of failure in the chip network increases and providing correction and fault tol...

متن کامل

Intrusion-Resilient Middleware Design and Validation

Intrusion Tolerance has become a reference paradigm for dealing with intrusions and accidental faults, achieving security and dependability in an automatic way, much along the lines of classical fault tolerance. This chapter is an introduction to the design and validation of intrusion-tolerant middleware and systems.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002