Locality Sensitive Hashing Based Clustering
نویسندگان
چکیده
Definition In learning systems with kernels, the shape and size of a kernel plays a critical role for accuracy and generalization. Most kernels have a distance metric parameter, which determines the size and shape of the kernel in the sense of a Mahalanobis distance. Advanced kernel learning tune every kernel’s distance metric individually, instead of turning one global distance metric for all kernels.
منابع مشابه
Kernelized Locality-Sensitive Hashing for Semi-Supervised Agglomerative Clustering
Large scale agglomerative clustering is hindered by computational burdens. We propose a novel scheme where exact inter-instance distance calculation is replaced by the Hamming distance between Kernelized Locality-Sensitive Hashing (KLSH) hashed values. This results in a method that drastically decreases computation time. Additionally, we take advantage of certain labeled data points via distanc...
متن کاملEfficient Clustering of Metagenomic Sequences using Locality Sensitive Hashing
The new generation of genomic technologies have allowed researchers to determine the collective DNA of organisms (e.g., microbes) co-existing as communities across the ecosystem (e.g., within the human host). There is a need for the computational approaches to analyze and annotate the large volumes of available sequence data from such microbial communities (metagenomes). In this paper, we devel...
متن کاملOPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection
In this paper, we propose methods for author identification task dividing into author clustering and style breach detection. Our solution to the first problem consists of locality-sensitive hashing based clustering of real-valued vectors, which are mixtures of stylometric features and bag of n-grams. For the second problem, we propose a statistical approach based on some different tf-idf featur...
متن کاملA Content-based Music Similarity Retrieval Scheme by Using BoW Representation and LSH-based Retrieval
This extended abstract paper presents detailed information about a content-based music similarity retrieval scheme, which is based on locality sensitive hashing (LSH). Our scheme considered MFCC and time histogram (TH) as two major features to represent the properties of audio music similarity. Next, each feature is depicted by Bag of Words (BoW), which k-means clustering summarizes extracted f...
متن کاملScalable Techniques for Clustering the Web
Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which we...
متن کامل