nearest-neighbor | 易学教程

How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

阅读更多关于 How to find k nearest neighbors to the median of n distinct numbers in O(n) time?

I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are less that the median and all the elements to the right are greater than the median. But how do I find the k nearest neighbors to the median in O(n) time? If the median is n, the numbers to the left are less than n and the numbers to the right are greater than n. However, the array is not sorted in the left or the right sides. The numbers are any set of distinct numbers given by the user. The problem is from Introduction to

Find all nearest neighbors within a specific distance

阅读更多关于 Find all nearest neighbors within a specific distance

I have a large list of x and y coordinates, stored in an numpy array. Coordinates = [[ 60037633 289492298] [ 60782468 289401668] [ 60057234 289419794]] ... ... What I want is to find all nearest neighbors within a specific distance (lets say 3 meters) and store the result so that I later can do some further analysis on the result. For most packages I found it is necessary to decided how many NNs should be found but I just want all within the set distance. How can I achieve something like that and what is the fastest and best way to achieve something like that for a large dataset (some million

How does the KD-tree nearest neighbor search work?

阅读更多关于 How does the KD-tree nearest neighbor search work?

I am looking at the Wikipedia page for KD trees. As an example, I implemented, in python, the algorithm for building a kd tree listed. The algorithm for doing KNN search with a KD tree, however, switches languages and isn't totally clear. The English explanation starts making sense, but parts of it (such as the area where they "unwind recursion" to check other leaf nodes) don't really make any sense to me. How does this work, and how can one do a KNN search with a KD tree in python? This isn't meant to be a "send me the code!" type question, and I don't expect that. Just a brief explanation

How to find the closest 2 points in a 100 dimensional space with 500,000 points?

阅读更多关于 How to find the closest 2 points in a 100 dimensional space with 500,000 points?

I have a database with 500,000 points in a 100 dimensional space, and I want to find the closest 2 points. How do I do it? Update: Space is Euclidean, Sorry. And thanks for all the answers. BTW this is not homework. You could try the ANN library , but that only gives reliable results up to 20 dimensions. Nikita Rybak There's a chapter in Introduction to Algorithms devoted to finding two closest points in two-dimensional space in O(n*logn) time. You can check it out on google books . In fact, I suggest it for everyone as the way they apply divide-and-conquer technique to this problem is very

Finding closest point from other data frame

阅读更多关于 Finding closest point from other data frame

问题 I have two data frames , one is with 0.8 million rows with x and Y coordinates, another data frame is of 70000 rows with X and Y coordinates. I want to know logic and code in R where I want to associate data point from frame 1 to closest point in data frame 2. Is there any standard package to do so ? I am running nested for loop. But this is very slow as it is getting iterated for 0.8 million * 70000 times which is very time consuming. 回答1: I found a faster way to get the expected result

Finding closest point from other data frame

阅读更多关于 Finding closest point from other data frame

I have two data frames , one is with 0.8 million rows with x and Y coordinates, another data frame is of 70000 rows with X and Y coordinates. I want to know logic and code in R where I want to associate data point from frame 1 to closest point in data frame 2. Is there any standard package to do so ? I am running nested for loop. But this is very slow as it is getting iterated for 0.8 million * 70000 times which is very time consuming. Hugo I found a faster way to get the expected result using the data.table library: library(data.table) time0 <- Sys.time() Here is some random data: df1 <- data

Rotate an image in C++ without using OpenCV functions

阅读更多关于 Rotate an image in C++ without using OpenCV functions

问题 Description :- I am trying to rotate an image without using OpenCV functions in C++. The rotation center need not be the center of the image. It could be a different point (offset from the image center). So far I followed a variety of sources to do image interpolation and I am aware of a source which does the job perfectly in MATLAB. I tried to mimic the same in C++ without OpenCV functions. But I am not getting the expected rotated image. Instead my output appears like a small horizontal

Scaling up an image using nearest-neighbor

阅读更多关于 Scaling up an image using nearest-neighbor

I have been trying to make my program scale up an image. I had some problem to allocate new space for my scaled image, but I think it is fixed. The problem I am having is that the program crashes when I am trying to send back my image from my temporary memory holder. The loaded image is placed in my struct Image . The pixels are placed in img->pixels , the height in img->height and the width in img->width . But I have no idea why the program crashes when I transfer the pixels from my tmp2 struct to my img struct while it does not crash when I do the opposite. Here is the code: void makeBigger

How do you optimize this code for nn prediction?

阅读更多关于 How do you optimize this code for nn prediction?

How do you optimize this code? At the moment it is running to slow for the amount of data that goes through this loop. This code runs 1-nearest neighbor. It will predict the label of the training_element based off the p_data_set # [x] , [[x1],[x2],[x3]], [l1, l2, l3] def prediction(training_element, p_data_set, p_label_set): temp = np.array([], dtype=float) for p in p_data_set: temp = np.append(temp, distance.euclidean(training_element, p)) minIndex = np.argmin(temp) return p_label_set[minIndex] Use a k -D tree for fast nearest-neighbour lookups, e.g. scipy.spatial.cKDTree : from scipy.spatial

Is LSH about transforming vectors to binary vectors for hamming distance?

阅读更多关于 Is LSH about transforming vectors to binary vectors for hamming distance?

I read some paper about LSH and I know that is used for solving the approximated k-NN problem. We can divide the algorithm in two parts: Given a vector in D dimensions (where D is big) of any value, translate it with a set of N (where N<<D ) hash functions to a binary vector in N dimensions. Using hamming distance, apply some search technique on the set of given binary codes obtained from phase 1 to find the k-NN. The keypoint is that computing the hamming distance for vectors in N dimensions is fast using XOR. Anyway, I have two questions: Point 1. is still necessary if we use a binary