Given two (large) sets of points, how can I efficiently find pairs that are nearest to each other?

问题

I need to solve a computational problem that boils down to searching for reciprocally-nearest pairs of points between two sets. The problem goes something like this:

Given a set of points A and a set of points B in euclidean space, find all pairs (a,b) such that b is the closest point in B to a and a is the closest point in A to b.

The sets A and B are of approximately equal size, and we will call this size N. For my particular problem N is approximately 250,000.

The brute force solution is to compare every point against every other point, which has quadratic time complexity. Is there any more efficient algorithm for doing this?

回答1:

A data structure I found very useful when I had to do nearest neighbour searches was a kd-tree. Wikipedia has a nice overview and this is an excellent in-depth discussion of the algorithm if you're implementing your own (although a library may well exist already - you don't mention which language you're using). The most important thing about a kd-tree is that it allows nearest-neghbour searches to be performed in O(log N) time.

In that way, you could produce two lists of - the members of A and their nearest neighbour in B and the members of B and their nearest neighbour in A - in O(N log N) time. Then, you could compare the lists to see which pairs match. Done naively, that's O(N^2), though you might be able to think of a way to do it faster.

[edit] You've got me thinking; here's my second thought:

for(a in A)
    b := nearest(B, a)
    if a = nearest(A, b)
        add (a, b) to results
    end if
end for

function nearest(X, y)
    return nearest member of set X to point y
end function

By my reckoning, that's O(N log N).

回答2:

Sorry for picking up a rather old thread but I just wanted to add a solution I've found in my textbook for an Algorithm Design class:

There is a divide-and-conquer (think merge-sort) approach to solve this problem that should be O(n logn), I've only seen it for finding the shortest distance within one set of points but it should be easily adapted to require each pairing to consist of points from different sets.

Sort all points according to X-value.
Split the whole set in two equal parts.
Recurse on each half and pick the minimal distance of the two (d)
Find the right-most point (p) in the left half and check the distance for all points between p_x and p_x + d, if any of these distances are shorter than d that is the d to return, otherwise return d.

回答3:

Old thread, but I see there is a pretty recent comment.

I believe for an n dimensional set of points the near point between two sets can be found by finding the near point to the origin of the set difference. You can seek out the paper by Phillip Wolfe of Bell Labs where he lays out the algorithm. You can think of it by taking a random point in set A, finding the closest point in set B, then finding the closest point to the point in set B and so on. http://link.springer.com/article/10.1007%2FBF01580381

回答4:

BSP / Octree / Kd-tree (thanks, TZHX)
Space-filling curves (e.g. using Hilbert curve); google yields many relevant results for pair finding.

来源：https://stackoverflow.com/questions/5077318/given-two-large-sets-of-points-how-can-i-efficiently-find-pairs-that-are-near

标签

algorithm

performance

computational-geometry