问题
I found a nasty bug in my code where I forgot to convert an integer from str
to int
before looking it up in a sorted array of integers. Having fixed it, I am still surprised that this didn't cause an explicit exception.
Here's a demo:
In [1]: import numpy as np
In [2]: a = np.arange(1000, dtype=int)
In [3]: a.searchsorted('15')
Out[3]: 150
In [4]: a.searchsorted('150')
Out[4]: 150
In [5]: a.searchsorted('1500')
Out[5]: 151
In [6]: a.searchsorted('foo')
Out[6]: 1000
With a float
array this doesn't work, raising a TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
.
My main question is: why does this not cause an exception for an integer array?
This is especially surprising since you can do both np.arange(1000, dtype=int).astype(str)
and np.arange(1000, dtype=np.float64).astype(str, casting='safe')
.
Side questions:
- why is it converting the whole array and not the argument?
- why is the search string converted to
'<U32'
?
回答1:
This behavior happens because searchsorted
requires the needle and haystack to have the same dtype. This is achieved using np.promote_types
, which has the (perhaps unfortunate) behavior:
>>> np.promote_types(int, str)
dtype('S11')
This means that to get matching dtypes for an integer haystack and a string needle, the only valid transformation is to convert the haystack to a string type.
Once we have a common dtype, we check if it's possible to use with np.can_cast
. This explains why floats aren't turned into strings, but ints are:
In [1]: np.can_cast(np.float, np.promote_types(np.float, str))
Out[1]: False
In [2]: np.can_cast(np.int, np.promote_types(np.int, str))
Out[2]: True
So to summarize, the strange behavior is a combination of promotion rules where numeric + string => string, and casting rules where int => string is allowable.
来源:https://stackoverflow.com/questions/31325001/why-does-numpy-silently-convert-my-int-array-to-strings-when-calling-searchsorte