问题
I tried to find entries in an Array containing a substring with np.where and an in condition:
import numpy as np
foo = "aa"
bar = np.array(["aaa", "aab", "aca"])
np.where(foo in bar)
this only returns an empty Array.
Why is that so?
And is there a good alternative solution?
回答1:
We can use np.core.defchararray.find to find the position of foo
string in each element of bar
, which would return -1
if not found. Thus, it could be used to detect whether foo
is present in each element or not by checking for -1
on the output from find
. Finally, we would use np.flatnonzero
to get the indices of matches. So, we would have an implementation, like so -
np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Sample run -
In [91]: bar
Out[91]:
array(['aaa', 'aab', 'aca'],
dtype='|S3')
In [92]: foo
Out[92]: 'aa'
In [93]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[93]: array([0, 1])
In [94]: bar[2] = 'jaa'
In [95]: np.flatnonzero(np.core.defchararray.find(bar,foo)!=-1)
Out[95]: array([0, 1, 2])
回答2:
The way you are trying to use np.where
is incorrect. The first argument of np.where
should be a boolean array, and you are simply passing it a boolean.
foo in bar
>>> False
np.where(False)
>>> (array([], dtype=int32),)
np.where(np.array([True, True, False]))
>>> (array([0, 1], dtype=int32),)
The problem is that numpy does not define the in
operator as an element-wise boolean operation.
One way you could accomplish what you want is with a list comprehension.
foo = 'aa'
bar = np.array(['aaa', 'aab', 'aca'])
out = [i for i, v in enumerate(bar) if foo in v]
# out = [0, 1]
bar = ['aca', 'bba', 'baa', 'aaf', 'ccc']
out = [i for i, v in enumerate(bar) if foo in v]
# out = [2, 3]
回答3:
Look at some examples of using in
:
In [19]: bar = np.array(["aaa", "aab", "aca"])
In [20]: 'aa' in bar
Out[20]: False
In [21]: 'aaa' in bar
Out[21]: True
In [22]: 'aab' in bar
Out[22]: True
In [23]: 'aab' in list(bar)
It looks like in
when used with an array works as though the array was a list. ndarray
does have a __contains__
method, so in
works, but it is probably simple.
But in any case, note that in alist
does not check for substrings. The strings
__contains__
does the substring test, but I don't know any builtin class that propagates the test down to the component strings.
As Divakar
shows there is a collection of numpy functions that applies string methods to individual elements of an array.
In [42]: np.char.find(bar, 'aa')
Out[42]: array([ 0, 0, -1])
Docstring:
This module contains a set of functions for vectorized string operations and methods. The preferred alias fordefchararray
isnumpy.char
.
For operations like this I think the np.char
speeds are about same as with:
In [49]: np.frompyfunc(lambda x: x.find('aa'), 1, 1)(bar)
Out[49]: array([0, 0, -1], dtype=object)
In [50]: np.frompyfunc(lambda x: 'aa' in x, 1, 1)(bar)
Out[50]: array([True, True, False], dtype=object)
Further tests suggest that the ndarray
__contains__
operates on the flat
version of the array - that is, shape doesn't affect its behavior.
回答4:
You can also do something like this:
mask = [foo in x for x in bar]
filter = bar[ np.where( mask * bar != '') ]
来源:https://stackoverflow.com/questions/38974168/finding-entries-containing-a-substring-in-a-numpy-array