A Dynamic Deduplication Approach for Big Data Storage

نویسنده

Harshita Sharma

چکیده

As data is increasing every day, so it is very challenging task to manage storage devices for this explosive growth of digital data. Data reduction has become very crucial problem. Deduplication approach plays a vital role to remove redundancy in large scale cluster computing storage. As a result, deduplication provides better storage utilization by eliminating redundant copies of data and saving only one copy of data in storage devices. In this paper we propose a dynamic deduplication approach with higher efficiency, in which the data files are distributed across multiple tightly coupled computers shared by multiple users. As an effective data elimination approach it exploits data redundancy. Data deduplication first divides large data objects into smaller parts called chunks and represent them by their unique hash values using MD5 or SHA1 to identify duplicate data. The experimental results with real big data show that represented Deduplication approach improves DER (data elimination ratio) and gains storage space. KeywordsDeduplication, Whole file chunking, Content based chunking, Fixed size chunking, Deduplication ratio, Deduplication gain

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Review Paper on Hybrid Cloud Approach for Secure Authorized Data Deduplication

Cloud computing is best concept to handle big database as the world is moving towards digitization. The amount of digital data in the world is growing exponentially with time. Thus, employing storage optimization techniques is an essential requirement to large storage areas like cloud storage. Cloud computing is best concept to handle big datasets. Data de the best storage optimization techniqu...

متن کامل

Boafft: Distributed Deduplication for Big Data Storage in the Cloud

As data progressively grows within data centers, the cloud storage systems continuously facechallenges in saving storage capacity and providing capabilities necessary to move big data within an acceptable time frame. In this paper, we present the Boafft, a cloud storage system with distributed deduplication. The Boafft achieves scalable throughput and capacity usingmultiple data servers to dedu...

متن کامل

Offline Selective Data Deduplication for Primary Storage Systems

Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunat...

متن کامل

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Cluster deduplication has become a widely deployed technology in data protection services for Big Data to satisfy the requirements of service level agreement (SLA). However, it remains a great challenge for cluster deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio in cluster systems with low-end in...

متن کامل

Concurrent deletion in a distributed content-addressable storage system with global deduplication

Scalable, highly reliable distributed systems supporting data deduplication have recently become popular for storing backup and archival data. One of the important requirements for backup storage is the ability to delete data selectively. Unlike in traditional storage systems, data deletion in distributed systems with deduplication is a major challenge because deduplication leads to multiple ow...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

A Dynamic Deduplication Approach for Big Data Storage

نویسنده

چکیده

منابع مشابه

A Review Paper on Hybrid Cloud Approach for Secure Authorized Data Deduplication

Boafft: Distributed Deduplication for Big Data Storage in the Cloud

Offline Selective Data Deduplication for Primary Storage Systems

A Scalable Inline Cluster Deduplication Framework for Big Data Protection

Concurrent deletion in a distributed content-addressable storage system with global deduplication

عنوان ژورنال:

اشتراک گذاری