DiskReduce: RAIDing the Cloud
نویسندگان
چکیده
Data-Intensive Scalable Computing (DISC) file systems such as HDFS employs replication for reliability, typically delivering users with only about a third of the storage capacity of the raw disks. In this project, we investigate DiskReduce, a framework for integrating RAID into these replicated storage systems to lower storage capacity overhead, for example, from 200% to 25% when triplicated data is dynamically replaced with 8+2 RAID 6 encoding. We gathered usage data from large HDFS DISC systems and find that DISC files are huge relative to traditional and HPC file systems, but because DISC blocks are also huge, perfile RAID wastes significant capacity. We chose to encode blocks across files. We also studied the implication of reading RAIDed data to MapReduce job performance. We measured read performance benefits from replication that will be lost with erasure encoding. We find that triplicated files can be read at higher bandwidth than single-copy files as expected, but this advantage is perhaps smaller than expected, and is absent in many cases.
منابع مشابه
DiskReduce: RAID for Data-Intensive Scalable Computing (CMU-PDL-09-112)
Data-intensive file systems, developed for Internet services and popular in cloud computing, provide high reliability and availability by replicating data, typically three copies of everything. Alternatively high performance computing, which has comparable scale, and smaller scale enterprise storage systems get similar tolerance for multiple failures from lower overhead erasure encoding, or RAI...
متن کاملCrop Damage by Primates: Quantifying the Key Parameters of Crop-Raiding Events
Human-wildlife conflict often arises from crop-raiding, and insights regarding which aspects of raiding events determine crop loss are essential when developing and evaluating deterrents. However, because accounts of crop-raiding behaviour are frequently indirect, these parameters are rarely quantified or explicitly linked to crop damage. Using systematic observations of the behaviour of non-hu...
متن کاملCrop raiding patterns of solitary and social groups of red-tailed monkeys on cocoa pods in Uganda
Crop damage by wildlife is a very prevalent form of human-wildlife conflict adjacent to protected areas, and great economic losses from crop raiding impede efforts to protect wildlife. Management plans are needed to decrease damage by raiding wildlife, yet conservation biologists typically lack the basic information needed for informed conservation strategies. Red-tailed monkeys (Cercopithecus ...
متن کاملBREEDING AND RAIDING A Theory of Strategic Production of Skills
Some of the skills that lirms require are obtained only through on-the-job-training. This paper concentrates on the strategic production of skills within the firm. Firms obtain high-quality workers either by training their own (breeding) or by using the open market to bid away workers trained by other firms (raiding). Even when all firms have access to the same technology of production, trainin...
متن کاملExploring the effects of spatial autocorrelation when identifying key drivers of wildlife crop-raiding
Few universal trends in spatial patterns of wildlife crop-raiding have been found. Variations in wildlife ecology and movements, and human spatial use have been identified as causes of this apparent unpredictability. However, varying spatial patterns of spatial autocorrelation (SA) in human-wildlife conflict (HWC) data could also contribute. We explicitly explore the effects of SA on wildlife c...
متن کامل