问题
Given two numpy
arrays of nx3
and mx3
, what is an efficient way to determine the row indices (counter) wherein the rows are common in the two arrays. For instance I have the following solution, which is significantly slow for not even much larger arrays
def arrangment(arr1,arr2):
hits = []
for i in range(arr2.shape[0]):
current_row = np.repeat(arr2[i,:][None,:],arr1.shape[0],axis=0)
x = current_row - arr1
for j in range(arr1.shape[0]):
if np.isclose(x[j,0],0.0) and np.isclose(x[j,1],0.0) and np.isclose(x[j,2],0.0):
hits.append(j)
return hits
It checks if rows of arr2
exist in arr1
and returns the row indices of arr1
where the rows match. I need this arrangement to be always sequentially ascending in terms of rows of arr2
. For instance given
arr1 = np.array([[-1., -1., -1.],
[ 1., -1., -1.],
[ 1., 1., -1.],
[-1., 1., -1.],
[-1., -1., 1.],
[ 1., -1., 1.],
[ 1., 1., 1.],
[-1., 1., 1.]])
arr2 = np.array([[-1., 1., -1.],
[ 1., 1., -1.],
[ 1., 1., 1.],
[-1., 1., 1.]])
The function should return:
[3, 2, 6, 7]
回答1:
quick and dirty answer
(arr1[:, None] == arr2).all(-1).argmax(0)
array([3, 2, 6, 7])
Better answer
Takes care of chance a row in arr2
doesn't match anything in arr1
t = (arr1[:, None] == arr2).all(-1)
np.where(t.any(0), t.argmax(0), np.nan)
array([ 3., 2., 6., 7.])
As pointed out by @Divakar np.isclose
accounts for rounding error in comparing floats
t = np.isclose(arr1[:, None], arr2).all(-1)
np.where(t.any(0), t.argmax(0), np.nan)
回答2:
I had a similar problem in the past and I came up with a fairly optimised solution for it.
First you need a generalisation of numpy.unique
for multidimensional arrays, which for the sake of completeness I would copy it here
def unique2d(arr,consider_sort=False,return_index=False,return_inverse=False):
"""Get unique values along an axis for 2D arrays.
input:
arr:
2D array
consider_sort:
Does permutation of the values within the axis matter?
Two rows can contain the same values but with
different arrangements. If consider_sort
is True then those rows would be considered equal
return_index:
Similar to numpy unique
return_inverse:
Similar to numpy unique
returns:
2D array of unique rows
If return_index is True also returns indices
If return_inverse is True also returns the inverse array
"""
if consider_sort is True:
a = np.sort(arr,axis=1)
else:
a = arr
b = np.ascontiguousarray(a).view(np.dtype((np.void,
a.dtype.itemsize * a.shape[1])))
if return_inverse is False:
_, idx = np.unique(b, return_index=True)
else:
_, idx, inv = np.unique(b, return_index=True, return_inverse=True)
if return_index == False and return_inverse == False:
return arr[idx]
elif return_index == True and return_inverse == False:
return arr[idx], idx
elif return_index == False and return_inverse == True:
return arr[idx], inv
else:
return arr[idx], idx, inv
Now all you need is to concatenate (np.vstack
) your arrays and find the unique rows. The reverse mapping together with np.searchsorted
will give you the indices you need. So lets write another function similar to numpy.in2d
but for multidimensional (2D) arrays
def in2d_unsorted(arr1, arr2, axis=1, consider_sort=False):
"""Find the elements in arr1 which are also in
arr2 and sort them as the appear in arr2"""
assert arr1.dtype == arr2.dtype
if axis == 0:
arr1 = np.copy(arr1.T,order='C')
arr2 = np.copy(arr2.T,order='C')
if consider_sort is True:
sorter_arr1 = np.argsort(arr1)
arr1 = arr1[np.arange(arr1.shape[0])[:,None],sorter_arr1]
sorter_arr2 = np.argsort(arr2)
arr2 = arr2[np.arange(arr2.shape[0])[:,None],sorter_arr2]
arr = np.vstack((arr1,arr2))
_, inv = unique2d(arr, return_inverse=True)
size1 = arr1.shape[0]
size2 = arr2.shape[0]
arr3 = inv[:size1]
arr4 = inv[-size2:]
# Sort the indices as they appear in arr2
sorter = np.argsort(arr3)
idx = sorter[arr3.searchsorted(arr4, sorter=sorter)]
return idx
Now all you need to do is call in2d_unsorted
with your input parameters
>>> in2d_unsorted(arr1,arr2)
array([ 3, 2, 6, 7])
While may not be fully optimised this approach is much faster. Let's benchmark it against @piRSquared
s solutions
def indices_piR(arr1,arr2):
t = np.isclose(arr1[:, None], arr2).all(-1)
return np.where(t.any(0), t.argmax(0), np.nan)
with the following arrays
n=150
arr1 = np.random.permutation(n).reshape(n//3, 3)
idx = np.random.permutation(n//3)
arr2 = arr1[idx]
In [13]: np.allclose(in2d_unsorted(arr1,arr2),indices_piR(arr1,arr2))
True
In [14]: %timeit indices_piR(arr1,arr2)
10000 loops, best of 3: 181 µs per loop
In [15]: %timeit in2d_unsorted(arr1,arr2)
10000 loops, best of 3: 85.7 µs per loop
Now, for n=1500
In [24]: %timeit indices_piR(arr1,arr2)
100 loops, best of 3: 10.3 ms per loop
In [25]: %timeit in2d_unsorted(arr1,arr2)
1000 loops, best of 3: 403 µs per loop
and for n=15000
In [28]: %timeit indices_piR(A,B)
1 loop, best of 3: 1.02 s per loop
In [29]: %timeit in2d_unsorted(arr1,arr2)
100 loops, best of 3: 4.65 ms per loop
So for largeish
arrays this is over 200X faster compared to @piRSquared
's vectorised solution.
来源:https://stackoverflow.com/questions/41234161/check-common-elements-of-two-2d-numpy-arrays-either-row-or-column-wise