Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment

نویسندگان

  • Jingli Zhou
  • Ke Liu
  • Leihua Qin
  • Xuejun Nie
چکیده

Nowadays, data partition plays an important role in eliminating duplicate data in green storage and cloud storage system. Fixed-sized chunking and content based chunking are two kinds of commonly used partition methods to break a file into a sequence of blocks. Meanwhile, inverted index has become the standard indexing method in modern information retrieval field. For conveniently analyzing, the inverted index is represented as inverted matrix and a MapReduce strategy is used to decompose a complex matrix computation into a set of smaller parallelizable sub-matrix computations. In order to efficiently take advantage of the block information to calculate file correlation, we propose a novel Block-Ranking method which utilizes the number of duplicate blocks to weight the correlation among files. The experiment indicates that Block-Ranking algorithm has rapidly enhanced the retrieval efficiency in network storage environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Similarity Matching in Multimedia Content-Based Retrieval

ÐMany multimedia content-based retrieval systems allow query formulation with user setting of relative importance of features (e.g., color, texture, shape, etc) to mimic the user's perception of similarity. However, the systems do not modify their similarity matching functions, which are defined during the system development. In this paper, we present a neural network-based learning algorithm f...

متن کامل

Progressive Content-Sensitive Data Retrieval in Sensor Networks

Problem statement: For a sensor network comprising autonomous and self-organizing data sources, efficient similarity-based search for semantic-rich resources (such as video data) has been considered as a challenging task due to the lack of infrastructures and the multiple limitations (such as band-width, storage and energy). While the past research discussed much on routing protocols for sensor...

متن کامل

Content Based Image Retrieval in Similarity Integrated Network Using Combined Ranking Algorithm

In image rich websites like flickr, Photobucket retrieving images in such a large network is very useful but also very tedious process. It also consumes more time and the retrieved contents are not exactly relevant always. In this paper, three algorithms have been proposed to improve the performance of such sites. To compare the similarity of the images efficiently, HMok-SimRank algorithm is us...

متن کامل

Utilizing Context in Ranking Results from Distributed CBIR

Selection and ranking of relevant images from image collections remains a problem in content-based image retrieval. This problem becomes even more visible and acute when attempting to merge and rank multiple result sets retrieved from a distributed database environment. This paper presents findings from a project that investigated if combining text and image retrieval algorithms with the use of...

متن کامل

Scalable Locality-Sensitive Hashing for Similarity Search in High-Dimensional, Large-Scale Multimedia Datasets

Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming volume of data, while keeping low response times. Thus, scalability is imperative for similarity search in Webscale applications, but most existing methods a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JDCTA

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2010