approximate-nn-searching

Best data structure for high dimensional nearest neighbor search

阅读更多关于 Best data structure for high dimensional nearest neighbor search

问题 I'm actually working on high dimensional data (~50.000-100.000 features) and nearest neighbors search must be performed on it. I know that KD-Trees has poor performance as dimensions grows, and also I've read that in general, all space-partitioning data structures tends to perform exhaustive search with high dimensional data. Additionally, there are two important facts to be considered (ordered by relevance): Precision: The nearest neighbors must be found (not approximations). Speed: The

Two sets of high dimensional points: Find the nearest neighbour in the other set

阅读更多关于 Two sets of high dimensional points: Find the nearest neighbour in the other set

问题 I have 2 sets: A and B. Both sets contain the same number of high dimensional points. How do I find the nearest neighbour in Set A for every point in Set B? I thought about using a Voronoi diagram but it seems (according to wikipedia) that it is not suitable for dimensions higher than 2. Can someone suggest a method to me, please? 回答1: FLANN If your data do really lie in a high dimensional space, then you could use FLANN. It actually builds a number of rotated kd-trees and queries (a bit)

How to lookup the most similar bit vector in an prefix tree for NN-search?

阅读更多关于 How to lookup the most similar bit vector in an prefix tree for NN-search?

问题 The problem I'm trying to solve is explained in this question: Finding the single nearest neighbor using a Prefix tree in O(1)? My question is regarding the Proposed solution section in that question page. In that section it's mentioned that we find the nearest neighbor from each prefix tree by traversing the tree starting from the node. Well looking up if a key exists in a prefix tree is straightforward but getting the most similar one I don't understand at all. How to accomplish this? I

What is the ε (epsilon) parameter in Locality Sensitive Hashing (LSH)?

阅读更多关于 What is the ε (epsilon) parameter in Locality Sensitive Hashing (LSH)?

问题 I've read the original paper about Locality Sensitive Hashing. The complexity is in function of the parameter ε, but I don't understand what it is. Can you explain its meaning please? 回答1: ε is the approximation parameter . LSH (as FLANN & kd-GeRaF) is designed for high dimensional data. In that space, k-NN doesn't work well, in fact it is almost as slow as brute force, because of the curse of dimensionality. For that reason, we focus on solving the aproximate k-NN. Check Definition 1 from

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

阅读更多关于 How to find the closest 2 points in a 100 dimensional space with 500,000 points?

问题 I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it? Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework. 回答1: You could try the ANN library, but that only gives reliable results up to 20 dimensions. 回答2: There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books. In fact, I

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

阅读更多关于 How to find the closest 2 points in a 100 dimensional space with 500,000 points?

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it? Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework. You could try the ANN library , but that only gives reliable results up to 20 dimensions. Nikita Rybak There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books . In fact, I suggest it for everyone as the way they apply divide-and-conquer technique to this problem is very

Search in locality sensitive hashing

阅读更多关于 Search in locality sensitive hashing

I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper: Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon) ) random permutations of the bits. For each random permutation σ, we maintain a sorted order O σ of the bit vectors, in lexicographic order of the bits permuted by σ. Given a query bit vector q, we find the approximate nearest neighbor by doing the following: For each permu- tation σ, we perform a binary search on O σ to locate the two bit vectors closest to q (in the

Two sets of high dimensional points: Find the nearest neighbour in the other set

阅读更多关于 Two sets of high dimensional points: Find the nearest neighbour in the other set

I have 2 sets: A and B. Both sets contain the same number of high dimensional points. How do I find the nearest neighbour in Set A for every point in Set B? I thought about using a Voronoi diagram but it seems (according to wikipedia) that it is not suitable for dimensions higher than 2. Can someone suggest a method to me, please? gsamaras FLANN If your data do really lie in a high dimensional space, then you could use FLANN . It actually builds a number of rotated kd-trees and queries (a bit) every single tree, keeping the best results found. It also rotates the data-set to avoid nasty cases.

Search in locality sensitive hashing

阅读更多关于 Search in locality sensitive hashing

问题 I'm trying to understand the section 5. of this paper about LSH, in particular how to bucket the generated hashes. Quoting the linked paper: Given bit vectors consisting of d bits each, we choose N = O(n 1/(1+epsilon) ) random permutations of the bits. For each random permutation σ, we maintain a sorted order O σ of the bit vectors, in lexicographic order of the bits permuted by σ. Given a query bit vector q, we find the approximate nearest neighbor by doing the following: For each permu-