I have an array of N-dimensional vectors.
data = np.array([[5, 6, 1], [2, 0, 8], [4, 9, 3]])
In [1]: data
Out[1]:
array([[5, 6, 1],
As a slight improvement over the otherwise very good answer by DSM, instead of using np.argsort()
, it is more efficient to use np.argpartition()
if the order of the N greatest is of no consequence.
Partitioning an array arr
with index i
rearranges the elements such that the element at index i
is the ith greatest, while those on the left are greater and on the right are lesser. The partitions on the left and right are not necessarily sorted. This has the advantage that it runs in linear time.
I'd ravel, argsort, and then unravel. I'm not claiming this is the best way, only that it's the first way that occurred to me, and I'll probably delete it in shame after someone posts something more obvious. :-)
That said (choosing the top 2 values, arbitrarily):
In [73]: dists = sklearn.metrics.pairwise_distances(data)
In [74]: dists[np.tril_indices_from(dists, -1)] = 0
In [75]: dists
Out[75]:
array([[ 0. , 9.69535971, 3.74165739],
[ 0. , 0. , 10.48808848],
[ 0. , 0. , 0. ]])
In [76]: ii = np.unravel_index(np.argsort(dists.ravel())[-2:], dists.shape)
In [77]: ii
Out[77]: (array([0, 1]), array([1, 2]))
In [78]: dists[ii]
Out[78]: array([ 9.69535971, 10.48808848])