Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you?
نویسنده
چکیده
In large-scale systems where number of components can approach a million a failure is a significant problem. In this paper authors have presented and analyzed failure data from different large systems. More than 100,000 disks with different interfaces which come from at least four different vendors have been investigated. According to information which one can find in datasheets the annual failure rate should be at most 0.88%, but the results show that replacement rate is usually in the range of 2 to 4% and can be up to 13%. Another important outcome is that infant mortality is negligible, but wear-out starts earlier than expected and replacement rate is constantly increasing over the time. For different disk interfaces replacement rate is at the same level. Exponential distribution does not provide good modeling for time between
منابع مشابه
Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? (CMU-PDL-06-111)
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some f...
متن کاملDisk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You?
Component failure in large-scale IT installations is becoming an ever larger problem as the number of components in a single cluster approaches a million. In this paper, we present and analyze field-gathered disk replacement data from a number of large production systems, including high-performance computing sites and internet services sites. About 100,000 disks are covered by this data, some f...
متن کاملCalculating MTTF When You Have Zero Failures
How do you calculate the MTTF (Mean Time To Failure) when you have zero failures? If you use the standard equation for MTTF, which is the ratio of total testing time and the number of failures, you get an answer of infinity. In such cases, you define a confidence level between 0 and 100 and then compute the lower bound chi-square value using two degrees of freedom. The equation for calculating ...
متن کاملAnalysis of Conditional MTTF of Fault-Tolerant Systems
Mean time to failure (MTTF ) is one of the most frequently used dependability measures in practice. By convention, MTTF is the expected time for a system to reach any one of the failure states. For some systems however, the mean time to absorb to a subset of the failure states is of interest. Therefore, the concept of conditional MTTF may well be useful. In this paper, we formalize the de nitio...
متن کاملPerformance Analysis of Disk Arrays under Failure
Disk arrays (RAID) have been proposed as a possible approach to solving the emerging I/O bottleneck problem. The performance of a RAID system when all disks are operational and the MTTF,,, (mean time to system failure) have been well studied. However, the performance of disk arrays in the presence of failed disks has not received much attention. The same techniques that provide the storage effi...
متن کامل