How to efficiently split scipy sparse and numpy arrays into smaller N unequal chunks?

隐身守侯 提交于 2021-01-29 06:01:16

问题


After checking the documentation and this question I tried to split a numpy array and a sparse scipy matrices as follows:

>>>print(X.shape) 
(2399, 39999)

>>>print(type(X))
<class 'scipy.sparse.csr.csr_matrix'>

>>>print(X.toarray())

[[0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 ..., 
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]

Then:

new_array = np.split(X,3)

Out:

ValueError: array split does not result in an equal division

Then I tried to:

new_array = np.hsplit(X,3)

Out:

ValueError: bad axis1 argument to swapaxes

Thus, How can I split the array into N different unequal sized chunks?.


回答1:


Make a sparse matrix:

In [62]: M=(sparse.rand(10,3,.3,'csr')*10).astype(int)
In [63]: M
Out[63]: 
<10x3 sparse matrix of type '<class 'numpy.int32'>'
    with 9 stored elements in Compressed Sparse Row format>
In [64]: M.A
Out[64]: 
array([[0, 7, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 0],
       [0, 0, 5],
       [0, 0, 2],
       [0, 0, 6],
       [0, 4, 4],
       [7, 1, 0],
       [0, 0, 2]])

The dense equivalent is easily split. array_split handles unequal chunks, but you can also spell out the split as illustrated in the other answer.

In [65]: np.array_split(M.A, 3)
Out[65]: 
[array([[0, 7, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]), array([[0, 0, 5],
        [0, 0, 2],
        [0, 0, 6]]), array([[0, 4, 4],
        [7, 1, 0],
        [0, 0, 2]])]

In general numpy functions cannot work directly on sparse matrices. They aren't a subclass. Unless the function delegates the action to the array's own method, the function probably won't work. Often the function starts with np.asarray(M), which is not the same as M.toarray() (try it yourself).

But split is nothing more than slicing along the desired axis. I can produce the same 4,2,3 split with:

In [143]: alist = [M[0:4,:], M[4:7,:], M[7:10]]
In [144]: alist
Out[144]: 
[<4x3 sparse matrix of type '<class 'numpy.int32'>'
    with 1 stored elements in Compressed Sparse Row format>,
 <3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 3 stored elements in Compressed Sparse Row format>,
 <3x3 sparse matrix of type '<class 'numpy.int32'>'
    with 5 stored elements in Compressed Sparse Row format>]
In [145]: [m.A for m in alist]
Out[145]: 
[array([[0, 7, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]], dtype=int32), array([[0, 0, 5],
        [0, 0, 2],
        [0, 0, 6]], dtype=int32), array([[0, 4, 4],
        [7, 1, 0],
        [0, 0, 2]], dtype=int32)]

The rest is administrative details.

I should add that sparse slices are never views. They are new sparse matrices with their own data attribute.


With the split indexes in a list, we can construct the split list with a simple iteration:

In [146]: idx = [0,4,7,10]
In [149]: alist = []
In [150]: for i in range(len(idx)-1):
     ...:     alist.append(M[idx[i]:idx[i+1]])   

I haven't worked out the details of how to construct idx, though an obvious starting point in the 10, the M.shape[0].

For even splits (that fit)

In [160]: [M[i:i+5,:] for i in range(0,M.shape[0],5)]
Out[160]: 
[<5x3 sparse matrix of type '<class 'numpy.int32'>'
    with 2 stored elements in Compressed Sparse Row format>,
 <5x3 sparse matrix of type '<class 'numpy.int32'>'
    with 7 stored elements in Compressed Sparse Row format>]



回答2:


First, convert scipy.sparse.csr_matrix to numpy ndarray, then pass a list to numpy.split(ary, indices_or_sections, axis=0).

If indices_or_sections is a 1-D array of sorted integers, the entries indicate where along axis the array is split. For example, [2, 3] would, for axis=0, result in ary[:2] ary[2:3] ary[3:]

https://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html

X1, X2, X3 = np.split(X.toarray(), [1000,2000])


来源:https://stackoverflow.com/questions/43049072/how-to-efficiently-split-scipy-sparse-and-numpy-arrays-into-smaller-n-unequal-ch

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!