Block-Ranking: Content Similarity Retrieval Based on Data Partition in Network Storage Environment
نویسندگان
چکیده
Nowadays, data partition plays an important role in eliminating duplicate data in green storage and cloud storage system. Fixed-sized chunking and content based chunking are two kinds of commonly used partition methods to break a file into a sequence of blocks. Meanwhile, inverted index has become the standard indexing method in modern information retrieval field. For conveniently analyzing, the inverted index is represented as inverted matrix and a MapReduce strategy is used to decompose a complex matrix computation into a set of smaller parallelizable sub-matrix computations. In order to efficiently take advantage of the block information to calculate file correlation, we propose a novel Block-Ranking method which utilizes the number of duplicate blocks to weight the correlation among files. The experiment indicates that Block-Ranking algorithm has rapidly enhanced the retrieval efficiency in network storage environment.
منابع مشابه
Learning Similarity Matching in Multimedia Content-Based Retrieval
ÐMany multimedia content-based retrieval systems allow query formulation with user setting of relative importance of features (e.g., color, texture, shape, etc) to mimic the user's perception of similarity. However, the systems do not modify their similarity matching functions, which are defined during the system development. In this paper, we present a neural network-based learning algorithm f...
متن کاملProgressive Content-Sensitive Data Retrieval in Sensor Networks
Problem statement: For a sensor network comprising autonomous and self-organizing data sources, efficient similarity-based search for semantic-rich resources (such as video data) has been considered as a challenging task due to the lack of infrastructures and the multiple limitations (such as band-width, storage and energy). While the past research discussed much on routing protocols for sensor...
متن کاملContent Based Image Retrieval in Similarity Integrated Network Using Combined Ranking Algorithm
In image rich websites like flickr, Photobucket retrieving images in such a large network is very useful but also very tedious process. It also consumes more time and the retrieved contents are not exactly relevant always. In this paper, three algorithms have been proposed to improve the performance of such sites. To compare the similarity of the images efficiently, HMok-SimRank algorithm is us...
متن کاملUtilizing Context in Ranking Results from Distributed CBIR
Selection and ranking of relevant images from image collections remains a problem in content-based image retrieval. This problem becomes even more visible and acute when attempting to merge and rank multiple result sets retrieved from a distributed database environment. This paper presents findings from a project that investigated if combining text and image retrieval algorithms with the use of...
متن کاملScalable Locality-Sensitive Hashing for Similarity Search in High-Dimensional, Large-Scale Multimedia Datasets
Similarity search is critical for many database applications, including the increasingly popular online services for Content-Based Multimedia Retrieval (CBMR). These services, which include image search engines, must handle an overwhelming volume of data, while keeping low response times. Thus, scalability is imperative for similarity search in Webscale applications, but most existing methods a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JDCTA
دوره 4 شماره
صفحات -
تاریخ انتشار 2010