For every point in an array, find the closest point to it in a second array and output that index

后端未结

关注

 2  1553

庸人自扰 2020-12-22 02:12

If I have two arrays:

X = np.random.rand(10000,2)
Y = np.random.rand(10000,2)

How can I, for each point in X, find out which point in Y is

2条回答

北海茫月 (楼主)

2020-12-22 02:44

This has to be the most asked numpy question (I've answered it myself twice in the last week), but since it can be phrased a million ways:

import numpy as np
import scipy.spatial.distance.cdist as cdist

def withScipy(X,Y):  # faster
    return np.argmin(cdist(X,Y,'sqeuclidean'),axis=0)

def withoutScipy(X,Y): #slower, using broadcasting
    return np.argmin(np.sum((X[None,:,:]-Y[:,None,:])**2,axis=-1), axis=0)

There's also a numpy-only method using einsum that's faster than my function (but not cdist) but I don't understand it well enough to explain it.

EDIT += 21 months:

The best way to do this algorithmically though is with KDTree.

from sklearn.neighbors import KDTree 
# since the sklearn implementation allows return_distance = False, saving memory

y_tree = KDTree(Y)
y_index_of_closest = y_tree.query(X, k = 1, return_distance = False)

@HansMusgrave has a pretty good speedup for KDTree below.

And for completion's sake, the np.einsum answer, which I now understand:

np.argmin(                                      #  (X - Y) ** 2 
    np.einsum('ij, ij ->i', X, X)[:, None] +    # = X ** 2        \
    np.einsum('ij, ij ->i', Y, Y)          -    # + Y ** 2        \
    2 * X.dot(Y.T),                             # - 2 * X * Y
    axis = 1)

@Divakar explains this method well on the wiki page of his package eucl_dist

0 讨论(0)

查看其它2个回答