问题
I am using pandas 0.16.2, numpy 1.9.2 and numba 0.20.
Is there any way to get numba to support arrays of strings in nopython mode? Alternatively, could I somehow convert strings to numbers which numba would recognise?
I have to run certain loops on an array of strings (a column from a pandas dataframe); if I could use numba the code would be substantially faster.
I have come up with this minimal example to show what I mean:
import numpy as np
import numba
x=np.array(['some','text','this','is'])
@numba.jit(nopython=True)
def numba_str(txt):
x=0
for i in xrange(txt.size):
if txt[i]=='text':
x += 1
return x
print numba_str(x)
The error I get is:
Failed at nopython (nopython frontend)
Undeclared ==([char x 4], str)
Thanks!
回答1:
Strings are not yet supported by Numba (as of version 20.0). Actually, "character sequences are supported, but no operations are available on them".
Indeed, a possible workaround is to interpret characters as numbers. For ASCII characters this is straightforward, see the Python ord
and chr
functions. However, already for your minimal example, you end with functions that are a lot less readable:
import numpy as np
import numba
x=np.array(['some','text','this','is'])
@numba.jit(nopython=True)
def numba_str(txt):
x=0
for i in xrange(txt.shape[0]):
if (txt[i,0]==116 and # 't'
txt[i,1]==101 and # 'e'
txt[i,2]==120 and # 'x'
txt[i,3]==116): # 't'
x += 1
return x
print numba_str(x.view(np.uint8).reshape(-1, x.itemsize))
来源:https://stackoverflow.com/questions/32056337/python-can-numba-work-with-arrays-of-strings-in-nopython-mode