The Normalized Distance Preserving Binary Codes and Distance Table
نویسندگان
چکیده
In the Euclidean space, the approximate nearest neighbors (ANN) search measures the similarity degree through computing the Euclidean distances, which owns high time complexity and large memory overhead. To address these problems, this paper maps the data from the Euclidean space into the Hamming space, and the normalized distance similarity restriction and the quantization error are required to satisfy. Firstly, the encoding centers and their binary labels are obtained through a lookup-based mechanism. Then, the candidate hashing functions are learnt under supervision of the binary labels, and the ones which satisfy the entropy criterion are selected to boost the distinctiveness of the learnt binary codes. During the training procedure, multiple groups of the hashing functions are generated based on different kinds of centers, which can weaken the inferior influence of the initial centers. The data with minimal average Hamming distances are returned as the nearest neighbors. In the Hamming space, different Euclidean distances may be substituted by one identical value, thus a distance table is predefined to distinguish the similarity degrees among the data pairs with the same Hamming distance. The final experimental results show that our algorithm is superior to many state-of-the-art methods.
منابع مشابه
Construction of Linear Codes Having Prescribed Primal-dual Minimum Distance with Applications in Cryptography
A method is given for the construction of linear codes with prescribed minimum distance and also prescribed minimum distance of the dual code. This works for codes over arbitrary finite fields. In the case of binary codes Matsumoto et al. showed how such codes can be used to construct cryptographic Boolean functions. This new method allows to compute new bounds on the size of such codes, extend...
متن کاملBinary Gray Codes with Long Bit Runs
We show that there exists an n-bit cyclic binary Gray code all of whose bit runs have length at least n − 3 log2 n. That is, there exists a cyclic ordering of {0, 1}n such that adjacent words differ in exactly one (coordinate) bit, and such that no bit changes its value twice in any subsequence of n − 3 log2 n consecutive words. Such Gray codes are ‘locally distance preserving’ in that Hamming ...
متن کاملCosine Similarity Search with Multi Index Hashing
Due to rapid development of the Internet, recent years have witnessed an explosion in the rate of data generation. Dealing with data at current scales brings up unprecedented challenges. From the algorithmic view point, executing existing linear algorithms in information retrieval and machine learning on such tremendous amounts of data incur intolerable computational and storage costs. To addre...
متن کاملComparing apples to apples in the evaluation of binary coding methods
We discuss methodological issues related to the evaluation of unsupervised binary code construction methods for nearest neighbor search. These issues have been widely ignored in literature. These coding methods attempt to preserve either Euclidean distance or angular (cosine) distance in the binary embedding space. We explain why when comparing a method whose goal is preserving cosine similarit...
متن کاملSimilarity-Preserving Binary Signature for Linear Subspaces
Linear subspace is an important representation for many kinds of real-world data in computer vision and pattern recognition, e.g. faces, motion videos, speeches. In this paper, first we define pairwise angular similarity and angular distance for linear subspaces. The angular distance satisfies non-negativity, identity of indiscernibles, symmetry and triangle inequality, and thus it is a metric....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Inf. Sci. Eng.
دوره 33 شماره
صفحات -
تاریخ انتشار 2017