A Dynamic Deduplication Approach for Big Data Storage
نویسنده
چکیده
As data is increasing every day, so it is very challenging task to manage storage devices for this explosive growth of digital data. Data reduction has become very crucial problem. Deduplication approach plays a vital role to remove redundancy in large scale cluster computing storage. As a result, deduplication provides better storage utilization by eliminating redundant copies of data and saving only one copy of data in storage devices. In this paper we propose a dynamic deduplication approach with higher efficiency, in which the data files are distributed across multiple tightly coupled computers shared by multiple users. As an effective data elimination approach it exploits data redundancy. Data deduplication first divides large data objects into smaller parts called chunks and represent them by their unique hash values using MD5 or SHA1 to identify duplicate data. The experimental results with real big data show that represented Deduplication approach improves DER (data elimination ratio) and gains storage space. KeywordsDeduplication, Whole file chunking, Content based chunking, Fixed size chunking, Deduplication ratio, Deduplication gain
منابع مشابه
A Review Paper on Hybrid Cloud Approach for Secure Authorized Data Deduplication
Cloud computing is best concept to handle big database as the world is moving towards digitization. The amount of digital data in the world is growing exponentially with time. Thus, employing storage optimization techniques is an essential requirement to large storage areas like cloud storage. Cloud computing is best concept to handle big datasets. Data de the best storage optimization techniqu...
متن کاملBoafft: Distributed Deduplication for Big Data Storage in the Cloud
As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to dedu...
متن کاملOffline Selective Data Deduplication for Primary Storage Systems
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunat...
متن کاملA Scalable Inline Cluster Deduplication Framework for Big Data Protection
Cluster deduplication has become a widely deployed technology in data protection services for Big Data to satisfy the requirements of service level agreement (SLA). However, it remains a great challenge for cluster deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio in cluster systems with low-end in...
متن کاملConcurrent deletion in a distributed content-addressable storage system with global deduplication
Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. One of the important requirements for backup storage is the ability to delete data selectively. Unlike in traditional storage systems, data deletion in distributed systems with deduplication is a major challenge because deduplication leads to multiple ow...
متن کامل