Why does numpy silently convert my int array to strings when calling searchsorted?

问题

I found a nasty bug in my code where I forgot to convert an integer from str to int before looking it up in a sorted array of integers. Having fixed it, I am still surprised that this didn't cause an explicit exception.

Here's a demo:

In [1]: import numpy as np

In [2]: a = np.arange(1000, dtype=int)

In [3]: a.searchsorted('15')
Out[3]: 150

In [4]: a.searchsorted('150')
Out[4]: 150

In [5]: a.searchsorted('1500')
Out[5]: 151

In [6]: a.searchsorted('foo')
Out[6]: 1000

With a float array this doesn't work, raising a TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'.

My main question is: why does this not cause an exception for an integer array?

This is especially surprising since you can do both np.arange(1000, dtype=int).astype(str) and np.arange(1000, dtype=np.float64).astype(str, casting='safe').

Side questions:

why is it converting the whole array and not the argument?
why is the search string converted to '<U32'?

回答1:

This behavior happens because searchsorted requires the needle and haystack to have the same dtype. This is achieved using np.promote_types, which has the (perhaps unfortunate) behavior:

>>> np.promote_types(int, str)
dtype('S11')

This means that to get matching dtypes for an integer haystack and a string needle, the only valid transformation is to convert the haystack to a string type.

Once we have a common dtype, we check if it's possible to use with np.can_cast. This explains why floats aren't turned into strings, but ints are:

In [1]: np.can_cast(np.float, np.promote_types(np.float, str))
Out[1]: False

In [2]: np.can_cast(np.int, np.promote_types(np.int, str))
Out[2]: True

So to summarize, the strange behavior is a combination of promotion rules where numeric + string => string, and casting rules where int => string is allowable.

来源：https://stackoverflow.com/questions/31325001/why-does-numpy-silently-convert-my-int-array-to-strings-when-calling-searchsorte

标签

python

arrays

numpy

type-conversion

binary-search