Failure Trends in a Large Disk Drive Population
نویسندگان
چکیده
It is estimated that over 90% of all new information produced in the world is being stored on magnetic media, most of it on hard disk drives. Despite their importance, there is relatively little published work on the failure patterns of disk drives, and the key factors that affect their lifetime. Most available data are either based on extrapolation from accelerated aging experiments or from relatively modest sized field studies. Moreover, larger population studies rarely have the infrastructure in place to collect health signals from components in operation, which is critical information for detailed failure analysis. We present data collected from detailed observations of a large disk drive population in a production Internet services deployment. The population observed is many times larger than that of previous studies. In addition to presenting failure statistics, we analyze the correlation between failures and several parameters generally believed to impact longevity. Our analysis identifies several parameters from the drive’s self monitoring facility (SMART) that correlate highly with failures. Despite this high correlation, we conclude that models based on SMART parameters alone are unlikely to be useful for predicting individual drive failures. Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported.
منابع مشابه
A Disk Architecture for Large Clusters of Workstations
Today and tomorrow’s computing clusters are likely to have one hard drive at every node. Hence large clusters need to be protected against disk failures. Extant solutions either require more than one drive per node or destroy the natural locality of disk accesses resulting from the decomposition of the problem space. We have presented a method avoiding these two pitfalls and showed how it can b...
متن کاملDesigning Eecient Fault Tolerant Vod Storage Servers: Techniques, Analysis, and Comparison
Recent technological advances in digital signal processing, data compression techniques, and high speed computer networking have made Video-on-Demand (VOD) servers feasible. A challenging task in such systems is servicing multiple clients simultaneously while satisfying real-time requirements of continuous delivery of objects at speciied bandwidths. To accomplish these tasks and realize economi...
متن کاملBayesian approaches to failure prediction for disk drives
Hard disk drive failures are rare but are often costly. The ability to predict failures is important to consumers, drive manufacturers, and computer system manufacturers alike. In this paper we investigate the abilities of two Bayesian methods to predict disk drive failures based on measurements of drive internal conditions. We first view the problem from an anomaly detection stance. We introdu...
متن کاملFailure analysis of a dynamometer drive shaft coupled to an engine
A typical dynamometer drive shaft was damaged during its working condition. This failure was repeated in four cases. In the present article, a failure analysis of a dynamometer drive shaft has been performed. To analyze the failure, the material investigation was carried out by scanning electron microscopy (SEM) images. Additionally, the micro-structure of the failed coupling shaft was photogra...
متن کاملActive Disks for Large-Scale Data Processing
A s processor performance increases and memory cost decreases, system intelligence continues to move away from the CPU and into peripherals. Storage system designers use this trend toward excess computing power to perform more complex processing and optimizations inside storage devices. To date, such optimizations take place at relatively low levels of the storage protocol. Trends in storage de...
متن کامل