nearest-neighbor | 易学教程

How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

阅读更多关于 How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

问题 I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are less that the median and all the elements to the right are greater than the median. But how do I find the k nearest neighbors to the median in O(n) time? If the median is n, the numbers to the left are less than n and the numbers to the right are greater than n. However, the array is not sorted in the left or the right

How to calculate the nearest neighbors using weka from the command line?

阅读更多关于 How to calculate the nearest neighbors using weka from the command line?

问题 I have a csv file, where each row is a vector of numbers representing a data point. I want to use weka from the command line to calculate the nearest neighbor of each data point in the csv file. I know how to do k nearest neighbor classification from the command line, but that's not what I want. I want the actual neighbors. How do I do this? I want to do this using weka and not some other tool. 回答1: Weka doesn't have a one liner to do what I think you are suggesting (ingest a file, convert it

How can I get the index of the Nearest Point when I use CGAL::K_neighbor_search to do the Nearest Neighbor Search?

阅读更多关于 How can I get the index of the Nearest Point when I use CGAL::K_neighbor_search to do the Nearest Neighbor Search?

问题 I am using CGAL's K_neighbor_search module to do the nearest neighbor search problem. It's nice and easily to use. The example code shows that given a query point, it can find the nearest neighbor point from a set of points as well as the distance. However, I can only get the nearest neighbor point itself. I don't know how to get the index of the point found by the algorithm. For example, I use the following code, std::list<Point_d> points; Tree tree(points.begin(), points.end()); Neighbor

Convert longitude latitude coordinates (WGS) to grid with equidistant axes (in given area)

阅读更多关于 Convert longitude latitude coordinates (WGS) to grid with equidistant axes (in given area)

问题 I have a lot of geocoordinates in tow data sets and want to run a nearest-neighbor-search. I came across the package 'RANN' and the function nn2(x,y) runs really fast. Now there is the problem, that of course in the area of London a degree to the north is a quite a longer way then a degree to the west. My idea now was to convert the location coordinates to some grid where one step in the direction of x is nearly the same as one step in the direction of y. The area is London (Center -0.1045,51

Finding nearest neighbor between 2 sets of dated points

阅读更多关于 Finding nearest neighbor between 2 sets of dated points

问题 I have 2 sets of points, set1 and set2 . Both sets of points have a data associated with the point. Points in set1 are "ephemeral", and only exist on the given date. Points in set2 are "permanent", are constructed at a given date, and then exist forever after that date. set.seed(1) dates <- seq(as.Date('2011-01-01'),as.Date('2011-12-31'),by='days') set1 <- data.frame(lat=40+runif(10000), lon=-70+runif(10000),date=sample(dates,10000,replace=TRUE)) set2 <- data.frame(lat=40+runif(100), lon=-70

Why we need a coarse quantizer?

阅读更多关于 Why we need a coarse quantizer?

问题 In Product Quantization for Nearest Neighbor Search, when it comes to section IV.A, it says they they will use a coarse quantizer too (which they way I feel it, is just a really smaller product quantizer, smaller w.r.t. k , the number of centroids). I don't really get why this helps the search procedure and the cause might be that I think I don't get the way they use it. Any ides please ? 回答1: As mentioned in the NON EXHAUSTIVE SEARCH section, Approximate nearest neighbor search with product

How to hash lists?

阅读更多关于 How to hash lists?

问题 Lists are not hashable. However, I am implementing LSH and I am seeking for a hash function that will correspond a list of positive integers (in [1, 29.000]) to k buckets. The number of lists is D, where D > k (I think) and D = 40.000, where k is not yet known (open to suggestions). Example (D = 4, k = 2): 118 | 27 | 1002 | 225 128 | 85 | 2000 | 8700 512 | 88 | 2500 | 10000 600 | 97 | 6500 | 24000 800 | 99 | 7024 | 25874 The first column should be given as input to the hash function and

How Locality Sensitive Hashing (LSH) works?

阅读更多关于 How Locality Sensitive Hashing (LSH) works?

问题 I've read already this question, but unfortunately it didn't help. What I don't understand is what we do once we understood which bucket assign to our high-dimensional space query vector q : suppose that using our set of locality sensitive family functions h_1,h_2,...,h_n we have translated q to a low-dimension ( n dimensions) hash code c . Then c is the index of the bucket which q is assigned to and where (hopefully) are assigned also its nearest neighbors, let say that there are 100 vectors

Why we need a coarse quantizer?

阅读更多关于 Why we need a coarse quantizer?

In Product Quantization for Nearest Neighbor Search , when it comes to section IV.A, it says they they will use a coarse quantizer too (which they way I feel it, is just a really smaller product quantizer, smaller w.r.t. k , the number of centroids). I don't really get why this helps the search procedure and the cause might be that I think I don't get the way they use it. Any ides please ? As mentioned in the NON EXHAUSTIVE SEARCH section, Approximate nearest neighbor search with product quantizers is fast and reduces significantly the memory requirements for storing the descriptors.

How to speed up nearest search in Pandas (perhaps by vectorizing code)

阅读更多关于 How to speed up nearest search in Pandas (perhaps by vectorizing code)

问题 I have two dataframes. Each one contains locations (X,Y) and a value for that point. For each point in the first dataframe I want to find the closest point in the second dataframe and then find the difference. I have code that is working, but it uses a for loop, which is slow. Any suggestions for how to speed this up? I know that it is generally a good idea to get rid of for loops in pandas, for performance, but I don't see how to do that in this case. Here is some sample code: import pandas