Repair Time in Distributed Storage Systems
نویسندگان
چکیده
In this paper, we analyze a highly distributed backup storage system realized by means of nano datacenters (NaDa). NaDa have been recently proposed as a way to mitigate the growing energy, bandwidth and device costs of traditional data centers, following the popularity of cloud computing. These service provider-controlled peer-to-peer systems take advantage of resources already committed to always-on set top boxes, the fact they do not generate heat dissipation costs and their proximity to users. In this kind of systems redundancy is introduced to preserve the data in case of peer failures or departures. To ensure long-term fault tolerance, the storage system must have a self-repairing service that continuously reconstructs the fragments of redundancy that are lost. The speed of this reconstruction process is crucial for the data survival. This speed is mainly determined by how much bandwidth, which is a critical resource of such systems, is available. In the literature, the reconstruction times are modeled as independent (e.g., poissonian, deterministic, or more generally following any distribution). In practice, however, numerous reconstructions start at the same time (when the system detects that a peer has failed). Consequently, they are correlated to each other because concurrent reconstructions do compete for the same bandwidth. This correlation negatively impacts the efficiency of the bandwidth utilization and henceforth the repair time. We propose a new analytical framework that takes into account this correlation when estimating the repair time and the probability of data loss. Mainly, we introduce a queuing model in which reconstructions are served by peers at a rate that depends on the available bandwidth. We show that the load is unbalanced among peers (young peers inherently store less data than the old ones). This leads us to introduce a correcting factor on the repair rate of the system. The models and schemes proposed are validated by mathematical analysis, extensive set of simulations, and experimentation using the GRID5000 test-bed platform. This new model allows system designers to operate a more accurate choice of system parameters in function of their targeted data durability. ? The research leading to these results has received funding from the European Project FP7 EULER, ANR CEDRE, ANR AGAPE, Associated Team AlDyNet, project ECOS-Sud Chile and région PACA. 2 Authors Suppressed Due to Excessive Length
منابع مشابه
Hybrid Regenerating Codes for Distributed Storage Systems
Distributed storage systems are mainly justified due to their ability to store data reliably over some unreliable nodes such that the system can have long term durability. Recently, regenerating codes are proposed to make a balance between the repair bandwidth and the storage capacity per node. This is achieved through using the notion of network coding approach. In this paper, a new variation ...
متن کاملA Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملEvaluation of Energy Storage Technologies and Applications Pinpointing Renewable Energy Resources Intermittency Removal
Renewable energy sources (RES), especially wind power plants, have high priority of promotion in the energy policies worldwide. An increasing share of RES and distributed generation (DG), should, as has been assumed, provide improvement in reliability of electricity delivery to the customers. Paper presented here concentrates on electricity storage systems technologies and applications pinpoint...
متن کاملEvaluation of Energy Storage Technologies and Applications Pinpointing Renewable Energy Resources Intermittency Removal
Renewable energy sources (RES), especially wind power plants, have high priority of promotion in the energy policies worldwide. An increasing share of RES and distributed generation (DG), should, as has been assumed, provide improvement in reliability of electricity delivery to the customers. Paper presented here concentrates on electricity storage systems technologies and applications pinpoint...
متن کاملCapacity of Wireless Distributed Storage Systems with Broadcast Repair
In wireless distributed storage systems, storage nodes are connected by wireless channels, which are broadcast in nature. This paper exploits this unique feature to design an efficient repair mechanism, called broadcast repair, for wireless distributed storage systems in the presence of multiple-node failures. Due to the broadcast nature of wireless transmission, we advocate a new measure on re...
متن کاملAn Empirical Study of the Repair Performance of Novel Coding Schemes for Networked Distributed Storage Systems
Erasure coding techniques are getting integrated in networked distributed storage systems as a way to provide fault-tolerance at the cost of less storage overhead than traditional replication. Redundancy is maintained over time through repair mechanisms, which may entail large network resource overheads. In recent years, several novel codes tailor-made for distributed storage have been proposed...
متن کامل