Two-Level Metadata Management for Data Deduplication System
نویسندگان
چکیده
Data deduplication is an essential solution to reduce storage space requirement. Especially chunking based data deduplication is very effective for backup workloads which tend to be files that evolve slowly, mainly through small changes and additions. In this paper, we introduce a novel data deduplication scheme which can be efficiently used with low bandwidth network in a rapid time. The key points are using tree map searching and classifying data as global and metadata. These are the main aspects to influencing fast performance of the data deduplication.
منابع مشابه
Design and Implementation of a Library Metadata Management Framework and its Application in Fuzzy Data Deduplication and Data Reconciliation with Authority Data
We describe the application of a generic workflow management system to the problem of metadata processing in the library domain. The requirements for such a framework and acting real-world forces are examined. The design of the framework is layed out and illustrated by means of two example workflows: fuzzy data deduplication and data reconciliation with authority data. Fuzzy data deduplication ...
متن کاملMetadata Considered Harmful...to Deduplication
Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata ...
متن کاملDesign and Implementation of an Open-Source Deduplication Platform for Research A RESEARCH PROFICIENCY EXAM PRESENTED BY
Data deduplication is a technique used to improve storage utilization by eliminating duplicate data. Duplicate data blocks are not stored and instead a reference to the original data block is updated. Unique data chunks are identified using techniques such as hashing, and an index of all the existing chunks is maintained. When new data blocks are written, they are hashed and compared to the has...
متن کاملA Robust Fault-Tolerant and Scalable Cluster-wide Deduplication for Shared-Nothing Storage Systems
Deduplication has been largely employed in distributed storage systems to improve space efficiency. Traditional deduplication research ignores the design specifications of shared-nothing distributed storage systems such as no central metadata bottleneck, scalability, and storage rebalancing. Further, deduplication introduces transactional changes, which are prone to errors in the event of a sys...
متن کاملDeduplication Strategy for Efficient Use of Cloud Storage
-With the enormous creation of data in the day to day life, storing it costs a lot of space, be it on a personal computer, a private cloud, a public cloud or any reusable media. The storage and transfer cost of data can be reduced by storing a unique copy of duplicate data. This gives birth to data deduplication, is one of the important data compression techniques and has been widely used in cl...
متن کامل