Randomized approximate nearest neighbors algorithm.

نویسندگان

  • Peter Wilcox Jones
  • Andrei Osipov
  • Vladimir Rokhlin
چکیده

We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {x(j)} in R(d), the algorithm attempts to find k nearest neighbors for each of x(j), where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N·(d + k). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x(j)} for an arbitrary point x ∈ R(d). The cost of each such query is proportional to T·(d·(log d) + log(N/k)·k·(d + log k)), and the memory requirements for the requisite data structure are of the order N·(d + k) + T·(d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x(j)} and illustrate its performance via several numerical examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Nearest Neighbor Preserving Embeddings

We show an analog to the Fast Johnson-Lindenstrauss Transform for Nearest Neighbor Preserving Embeddings in `2. These are sparse, randomized embeddings that preserve the (approximate) nearest neighbors. The dimensionality of the embedding space is bounded not by the size of the embedded set n, but by its doubling dimension λ. For most large real-world datasets this will mean a considerably lowe...

متن کامل

Randomized Algorithm for Approximate Nearest Neighbor Search in High Dimensions

A randomized algorithm employs a degree of randomness as part of its logic with uniform random bits as an auxiliary input to guide its behavior, in the hope of achieving good average runtime performance over all possible choices of the random bits. In this paper, we formulate a randomized algorithm capable of finding approximate nearest neighbors, specifically in high dimensional datasets. The ...

متن کامل

Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration

For many computer vision problems, the most time consuming component consists of nearest neighbor matching in high-dimensional spaces. There are no known exact algorithms for solving these high-dimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with onl...

متن کامل

Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning

We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 108 38  شماره 

صفحات  -

تاریخ انتشار 2011