The TPT-RAID Architecture for Box-Fault Tolerant Storage Systems

نویسندگان

  • Yitzhak Birk
  • Erez Zilber
چکیده

TPT-RAID is a multi-box RAID wherein each ECC group comprises at most one block from any given storage box, and can thus tolerate a box failure. It extends the idea of an out-of-band SAN controller into the RAID: data is sent directly between hosts and targets and among targets, and the RAID controller supervises ECC calculation by the targets. By preventing a communication bottleneck in the controller, excellent scalability is achieved while retaining the simplicity of centralized control. TPT-RAID, whose controller can be a software module within an out-of-band SAN controller, moreover conforms to a conventional switched network architecture, whereas an inband RAID controller would either constitute a communication bottleneck or would have to also be a full-fledged router. The design is validated in an InfiniBand-based prototype using iSCSI and iSER, and required changes to relevant protocols are introduced.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stripped mirroring RAID architecture

Redundant arrays of independent disks (RAID) provide an ecient stable storage system for parallel access and fault tolerance. The most common fault tolerant RAID architecture is RAID-1 or RAID-5. The disadvantage of RAID-1 lies in excessive redundancy, while the write performance of RAID-5 is only 1/4 of that of RAID-0. In this paper, we propose a high performance and highly reliable disk arra...

متن کامل

RAID-RMS: A fault tolerant stripped mirroring RAID architecture for distributed systems

Disk arrays, or RAIDs, have become the solution to increase the capacity, bandwidth and reliability of most storage systems. In spite of its high redundancy level, disk mirroring is a popular RAID paradigm, because replicating data also doubles the bandwidth available for processing read requests, improves the reliability and achieves fault tolerance. In this paper, we present a new RAID archit...

متن کامل

Reliability Models for Highly Fault-tolerant Storage Systems

We found that a reliability model commonly used to estimate Mean-Time-To-Data-Loss (MTTDL), while suitable for modeling RAID 0 and RAID 5, fails to accurately model systems having a fault-tolerance greater than 1. Therefore, to model the reliability of RAID 6, Triple-Replication, or k-of-n systems requires an alternate technique. In this paper, we explore some alternatives, and evaluate their e...

متن کامل

Fault-Tolerant Distributed Mass Storage for LHC Computing

In this paper we present the concept and first prototyping results of a modular fault-tolerant distributed mass storage architecture for large Linux PC clusters as they are deployed by the upcoming particle physics experiments. The device masquerading technique using an Enhanced Network Block Device (ENBD) enables local RAID over remote disks as the key concept of the ClusterRAID system. The bl...

متن کامل

Reliability Markov models are becoming unreliable ( WIP submission )

Markov models have traditionally been used to understand the reliability of storage systems. They provide intuition about the sensitivity of storage system reliability to changes in disk failure rates, rebuild rates, sector failure rates, scrubbing rates, and storage capacity. Unfortunately, as we move towards multi-disk fault tolerant storage systems, i.e., storage systems that tolerate two or...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007