How to match pairs of values contained in two numpy arrays

為{幸葍}努か 提交于 2021-02-18 11:11:38

问题


I have two sets of coordinates and want to find out which coordinates of the coo set are identical to any coordinate in the targets set. I want to know the indices in the coo set which means I'd like to get a list of indices or of bools.

import numpy as np

coo = np.array([[1,2],[1,6],[5,3],[3,6]]) # coordinates
targets = np.array([[5,3],[1,6]]) # coordinates of targets

print(np.isin(coo,targets))

[[ True False]
 [ True  True]
 [ True  True]
 [ True  True]]

The desired result would be one of the following two:

[False True True False] # bool list
[1,2] # list of concerning indices

My problem is, that ...

  • np.isin has no axis-attribute so that I could use axis=1.
  • even applying logical and to each row of the output would return True for the last element, which is wrong.

I am aware of loops and conditions but I am sure Python is equipped with ways for a more elegant solution.


回答1:


This solution will scale worse for large arrays, for such cases the other proposed answers will perform better.


Here's one way taking advantage of broadcasting:

(coo[:,None] == targets).all(2).any(1)
# array([False,  True,  True, False])

Details

Check for every row in coo whether or not it matches another in target by direct comparisson having added a first axis to coo so it becomes broadcastable against targets:

(coo[:,None] == targets)

array([[[False, False],
        [ True, False]],

       [[False, False],
        [ True,  True]],

       [[ True,  True],
        [False, False]],

       [[False, False],
        [False,  True]]])

Then check which ndarrays along the second axis have all values to True:

(coo[:,None] == targets).all(2)

array([[False, False],
       [False,  True],
       [ True, False],
       [False, False]])

And finally use any to check which rows have at least one True.




回答2:


The numpy_indexed package implements functionality of this type in a vectorized manner (disclaimer: I am its author). Sadly numpy lacks a lot of this functionality out of the box; I started numpy_indexed with the intention of having it merged into numpy, but there are some backwards compatibility concerns, and big packages like that tend to move slowly. So that hasnt happened in the last 3 years; but the python packaging ecosystem works so well nowadays that just adding one more package to your environment is just as simple, really.

import numpy_indexed as npi
bools = npi.in_(targets, coo)

This will have a time-complexity similar to that of the solution posted by @fountainhead (logarithmic rather than linear, as per the currently accepted answer), but also the npi library will give you the safety of automated tests, and a lot of other convenient options, should you decide to approach the problem from a slightly different angle.




回答3:


Here is a simple and intuitive solution that actually uses numpy.isin(), to match tuples, rather than match individual numbers:

# View as a 1d array of tuples
coo_view     = coo.view(dtype='i,i').reshape((-1,))
targets_view = targets.view(dtype='i,i').reshape((-1,))

result = np.isin(coo_view, targets_view)
print (result)
print(result.nonzero()[0])

Output:

[False  True  True False]
[1 2]

Notes:

  1. The creation of these views does not involve any copying of data.
  2. The dtype='i,i' specifies that we want each element of the view to be a tuple of two integers


来源:https://stackoverflow.com/questions/54828039/how-to-match-pairs-of-values-contained-in-two-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!