问题
I am constructing a sparse vector using a scipy.sparse.csr_matrix like so:
csr_matrix((values, (np.zeros(len(indices)), indices)), shape = (1, max_index))
This works fine for most of my data, but occasionally I get a ValueError: could not convert integer scalar.
This reproduces the problem:
In [145]: inds
Out[145]:
array([ 827969148, 996833913, 1968345558, 898183169, 1811744124,
2101454109, 133039182, 898183170, 919293479, 133039089])
In [146]: vals
Out[146]:
array([ 1., 1., 1., 1., 1., 2., 1., 1., 1., 1.])
In [147]: max_index
Out[147]:
2337713000
In [143]: csr_matrix((vals, (np.zeros(10), inds)), shape = (1, max_index+1))
...
996 fn = _sparsetools.csr_sum_duplicates
997 M,N = self._swap(self.shape)
--> 998 fn(M, N, self.indptr, self.indices, self.data)
999
1000 self.prune() # nnz may have changed
ValueError: could not convert integer scalar
inds is a np.int64 array and vals is a np.float64 array.
The relevant part of the scipy sum_duplicates code is here.
Note that this works:
In [235]: csr_matrix(([1,1], ([0,0], [1,2])), shape = (1, 2**34))
Out[235]:
<1x17179869184 sparse matrix of type '<type 'numpy.int64'>'
with 2 stored elements in Compressed Sparse Row format>
So the problem is not that one of the dimensions is > 2^31
Any thoughts why these values should be causing a problem?
回答1:
Might it be that max_index > 2**31 ? Try this, just to make sure:
csr_matrix((vals, (np.zeros(10), inds/2)), shape = (1, max_index/2))
回答2:
The max index you are giving is less than the maximum index of the rows you are supplying.
This
sparse.csr_matrix((vals, (np.zeros(10), inds)), shape = (1, np.max(inds)+1))
works fine with me.
Although making a .todense() results in memory error for the large size of the matrix
回答3:
Uncommenting the sum_duplicates - function will lead to other errors. But this fix: strange error when creating csr_matrix also solves your problem. You can extend the version_check to newer versions of scipy.
import scipy
import scipy.sparse
if scipy.__version__ in ("0.14.0", "0.14.1", "0.15.1"):
_get_index_dtype = scipy.sparse.sputils.get_index_dtype
def _my_get_index_dtype(*a, **kw):
kw.pop('check_contents', None)
return _get_index_dtype(*a, **kw)
scipy.sparse.compressed.get_index_dtype = _my_get_index_dtype
scipy.sparse.csr.get_index_dtype = _my_get_index_dtype
scipy.sparse.bsr.get_index_dtype = _my_get_index_dtype
来源:https://stackoverflow.com/questions/29168699/cryptic-scipy-could-not-convert-integer-scalar-error