How to find indices of a reordered numpy array?

问题

Say I have a sorted numpy array:

arr = np.array([0.0, 0.0],
               [0.5, 0.0],
               [1.0, 0.0],
               [0.0, 0.5],
               [0.5, 0.5],
               [1.0, 0.5],
               [0.0, 1.0],
               [0.5, 1.0],
               [1.0, 1.0])

and suppose I make a non trivial operation on it such that I have a new array which is the same as the old one but in another order:

arr2 = np.array([0.5, 0.0],
                [0.0, 0.0],
                [0.0, 0.5],
                [1.0, 0.0],
                [0.5, 0.5],
                [1.0, 0.5],
                [0.0, 1.0],
                [1.0, 1.0],
                [0.5, 1.0])

The question is: how do you get the indices of where each element of arr2 are placed in arr. In other terms, I want a method that takes both arrays and return an array the same length as arr2 but with the index of the element of arr. For example, the first element of the returned array would be the index of the first element of arr2 in arr.

where_things_are(arr2, arr) 
return : array([1, 0, 3, 2, 4, 5, 6, 8, 7])

Does a function like this already exists in numpy?

EDIT:

I tried:

np.array([np.where((arr == x).all(axis=1)) for x in arr2])

which returns what I want, but my question still holds: is there a more efficient way of doing this using numpy methods?

EDIT2:

It should also work if the length of arr2 is not the same as the length of the original array (like if I removed some elements from it). Thus it is not finding and inverting a permutation but rather finding where elements are located at.

回答1:

The key is inverting permutations. The code below works even if the original array is not sorted. If it is sorted then find_map_sorted can be used which obviously is faster.

UPDATE: Adapting to the OP's ever changing requirements, I've added a branch that handles lost elements.

import numpy as np

def invperm(p):
    q = np.empty_like(p)
    q[p] = np.arange(len(p))
    return q

def find_map(arr1, arr2):
    o1 = np.argsort(arr1)
    o2 = np.argsort(arr2)
    return o2[invperm(o1)]

def find_map_2d(arr1, arr2):
    o1 = np.lexsort(arr1.T)
    o2 = np.lexsort(arr2.T)
    return o2[invperm(o1)]

def find_map_sorted(arr1, arrs=None):
    if arrs is None:
        o1 = np.lexsort(arr1.T)
        return invperm(o1)
    # make unique-able
    rdtype = np.rec.fromrecords(arrs[:1, ::-1]).dtype
    recstack = np.r_[arrs[:,::-1], arr1[:,::-1]].view(rdtype).view(np.recarray)
    uniq, inverse = np.unique(recstack, return_inverse=True)
    return inverse[len(arrs):]

x1 = np.random.permutation(100000)
x2 = np.random.permutation(100000)
print(np.all(x2[find_map(x1, x2)] == x1))

rows = np.random.random((100000, 8))
r1 = rows[x1, :]
r2 = rows[x2, :]
print(np.all(r2[find_map_2d(r1, r2)] == r1))

rs = r1[np.lexsort(r1.T), :]
print(np.all(rs[find_map_sorted(r2), :] == r2))

# lose ten elements
print(np.all(rs[find_map_sorted(r2[:-10], rs), :] == r2[:-10]))

回答2:

Here is a way using numpy Broadcasting:

In [10]: ind = np.where(arr[:, None] == arr2[None, :])[1]

In [11]: ind[np.where(np.diff(ind)==0)]
Out[11]: array([1, 0, 3, 2, 4, 5, 6, 8, 7])

The idea behind this is, increasing the dimension of arrays so that their comparison produces a 3d array which since the original sub-array have length 2 if we had two consecutive equal items in second axis of the result of comparison they would be where both items are equal. For a better demonstration here is the result of comparison without selecting the second axis:

In [96]: np.where(arr[:, None] == arr2[None, :])
Out[96]: 
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
        3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
        7, 7, 8, 8, 8, 8, 8, 8]),
 array([0, 1, 1, 2, 3, 6, 0, 0, 1, 3, 4, 8, 0, 1, 3, 3, 5, 7, 1, 2, 2, 4, 5,
        6, 0, 2, 4, 4, 5, 8, 2, 3, 4, 5, 5, 7, 1, 2, 6, 6, 7, 8, 0, 4, 6, 7,
        8, 8, 3, 5, 6, 7, 7, 8]),
 array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,
        0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
        0, 1, 0, 0, 1, 0, 1, 1]))

And then for finding those items we just need to find the places that their diff is 0.

回答3:

The numpy_indexed package (disclaimer: i am its author) contains efficient functionality for exactly this type of problem; npi.indices is the ndarray-equivalent of list.index.

import numpy_indexed as npi
idx = npi.indices(arr, arr2)

This returns a list of indices such that arr[idx] == arr2. If arr2 contains elements not present in arr, a ValueError is raised; but you can control that with the 'missing' kwarg.

To answer your question if this functionality is included in numpy; yes, in the sense that numpy is a turing-complete ecosystem. But not really, if you count the number of lines of code required to implement this in an efficient, correct and general manner.

回答4:

If you guarantee uniqueness:

[ np.where(np.logical_and((arr2==x)[:,1], (arr2==x)[:,0])==True)[0][0] for x in arr]

Notice that, I converted your array to 2D: e.g.

arr2 = np.array([[0.5, 0.0],
[0.0, 0.0],
[0.0, 0.5],
[1.0, 0.0],
[0.5, 0.5],
[1.0, 0.5],
[0.0, 1.0],
[1.0, 1.0],
[0.5, 1.0]])

来源：https://stackoverflow.com/questions/42232540/how-to-find-indices-of-a-reordered-numpy-array

标签

python

arrays

sorting

numpy

multidimensional-array