问题
Say I have a sorted numpy array:
arr = np.array([0.0, 0.0],
[0.5, 0.0],
[1.0, 0.0],
[0.0, 0.5],
[0.5, 0.5],
[1.0, 0.5],
[0.0, 1.0],
[0.5, 1.0],
[1.0, 1.0])
and suppose I make a non trivial operation on it such that I have a new array which is the same as the old one but in another order:
arr2 = np.array([0.5, 0.0],
[0.0, 0.0],
[0.0, 0.5],
[1.0, 0.0],
[0.5, 0.5],
[1.0, 0.5],
[0.0, 1.0],
[1.0, 1.0],
[0.5, 1.0])
The question is: how do you get the indices of where each element of arr2
are placed in arr
. In other terms, I want a method that takes both arrays and return an array the same length as arr2
but with the index of the element of arr
. For example, the first element of the returned array would be the index of the first element of arr2
in arr
.
where_things_are(arr2, arr)
return : array([1, 0, 3, 2, 4, 5, 6, 8, 7])
Does a function like this already exists in numpy?
EDIT:
I tried:
np.array([np.where((arr == x).all(axis=1)) for x in arr2])
which returns what I want, but my question still holds: is there a more efficient way of doing this using numpy methods?
EDIT2:
It should also work if the length of arr2
is not the same as the length of the original array (like if I removed some elements from it). Thus it is not finding and inverting a permutation but rather finding where elements are located at.
回答1:
The key is inverting permutations. The code below works even if the original array is not sorted. If it is sorted then find_map_sorted
can be used which obviously is faster.
UPDATE: Adapting to the OP's ever changing requirements, I've added a branch that handles lost elements.
import numpy as np
def invperm(p):
q = np.empty_like(p)
q[p] = np.arange(len(p))
return q
def find_map(arr1, arr2):
o1 = np.argsort(arr1)
o2 = np.argsort(arr2)
return o2[invperm(o1)]
def find_map_2d(arr1, arr2):
o1 = np.lexsort(arr1.T)
o2 = np.lexsort(arr2.T)
return o2[invperm(o1)]
def find_map_sorted(arr1, arrs=None):
if arrs is None:
o1 = np.lexsort(arr1.T)
return invperm(o1)
# make unique-able
rdtype = np.rec.fromrecords(arrs[:1, ::-1]).dtype
recstack = np.r_[arrs[:,::-1], arr1[:,::-1]].view(rdtype).view(np.recarray)
uniq, inverse = np.unique(recstack, return_inverse=True)
return inverse[len(arrs):]
x1 = np.random.permutation(100000)
x2 = np.random.permutation(100000)
print(np.all(x2[find_map(x1, x2)] == x1))
rows = np.random.random((100000, 8))
r1 = rows[x1, :]
r2 = rows[x2, :]
print(np.all(r2[find_map_2d(r1, r2)] == r1))
rs = r1[np.lexsort(r1.T), :]
print(np.all(rs[find_map_sorted(r2), :] == r2))
# lose ten elements
print(np.all(rs[find_map_sorted(r2[:-10], rs), :] == r2[:-10]))
回答2:
Here is a way using numpy Broadcasting:
In [10]: ind = np.where(arr[:, None] == arr2[None, :])[1]
In [11]: ind[np.where(np.diff(ind)==0)]
Out[11]: array([1, 0, 3, 2, 4, 5, 6, 8, 7])
The idea behind this is, increasing the dimension of arrays so that their comparison produces a 3d array which since the original sub-array have length 2 if we had two consecutive equal items in second axis of the result of comparison they would be where both items are equal. For a better demonstration here is the result of comparison without selecting the second axis:
In [96]: np.where(arr[:, None] == arr2[None, :])
Out[96]:
(array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,
3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7,
7, 7, 8, 8, 8, 8, 8, 8]),
array([0, 1, 1, 2, 3, 6, 0, 0, 1, 3, 4, 8, 0, 1, 3, 3, 5, 7, 1, 2, 2, 4, 5,
6, 0, 2, 4, 4, 5, 8, 2, 3, 4, 5, 5, 7, 1, 2, 6, 6, 7, 8, 0, 4, 6, 7,
8, 8, 3, 5, 6, 7, 7, 8]),
array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,
0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1,
0, 1, 0, 0, 1, 0, 1, 1]))
And then for finding those items we just need to find the places that their diff is 0.
回答3:
The numpy_indexed package (disclaimer: i am its author) contains efficient functionality for exactly this type of problem; npi.indices is the ndarray-equivalent of list.index.
import numpy_indexed as npi
idx = npi.indices(arr, arr2)
This returns a list of indices such that arr[idx] == arr2. If arr2 contains elements not present in arr, a ValueError is raised; but you can control that with the 'missing' kwarg.
To answer your question if this functionality is included in numpy; yes, in the sense that numpy is a turing-complete ecosystem. But not really, if you count the number of lines of code required to implement this in an efficient, correct and general manner.
回答4:
If you guarantee uniqueness:
[ np.where(np.logical_and((arr2==x)[:,1], (arr2==x)[:,0])==True)[0][0] for x in arr]
Notice that, I converted your array to 2D: e.g.
arr2 = np.array([[0.5, 0.0],
[0.0, 0.0],
[0.0, 0.5],
[1.0, 0.0],
[0.5, 0.5],
[1.0, 0.5],
[0.0, 1.0],
[1.0, 1.0],
[0.5, 1.0]])
来源:https://stackoverflow.com/questions/42232540/how-to-find-indices-of-a-reordered-numpy-array