For every point in an array, find the closest point to it in a second array and output that index

后端 未结 2 1553
庸人自扰
庸人自扰 2020-12-22 02:12

If I have two arrays:

X = np.random.rand(10000,2)
Y = np.random.rand(10000,2)

How can I, for each point in X, find out which point in Y is

2条回答
  •  北海茫月
    2020-12-22 02:44

    This has to be the most asked numpy question (I've answered it myself twice in the last week), but since it can be phrased a million ways:

    import numpy as np
    import scipy.spatial.distance.cdist as cdist
    
    def withScipy(X,Y):  # faster
        return np.argmin(cdist(X,Y,'sqeuclidean'),axis=0)
    
    def withoutScipy(X,Y): #slower, using broadcasting
        return np.argmin(np.sum((X[None,:,:]-Y[:,None,:])**2,axis=-1), axis=0)
    

    There's also a numpy-only method using einsum that's faster than my function (but not cdist) but I don't understand it well enough to explain it.

    EDIT += 21 months:

    The best way to do this algorithmically though is with KDTree.

    from sklearn.neighbors import KDTree 
    # since the sklearn implementation allows return_distance = False, saving memory
    
    y_tree = KDTree(Y)
    y_index_of_closest = y_tree.query(X, k = 1, return_distance = False)
    

    @HansMusgrave has a pretty good speedup for KDTree below.

    And for completion's sake, the np.einsum answer, which I now understand:

    np.argmin(                                      #  (X - Y) ** 2 
        np.einsum('ij, ij ->i', X, X)[:, None] +    # = X ** 2        \
        np.einsum('ij, ij ->i', Y, Y)          -    # + Y ** 2        \
        2 * X.dot(Y.T),                             # - 2 * X * Y
        axis = 1)
    

    @Divakar explains this method well on the wiki page of his package eucl_dist

提交回复
热议问题