Sum over rows in scipy.sparse.csr_matrix

时间秒杀一切 提交于 2019-11-28 11:44:19

Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:

>>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
>>> np.dot(S, A.toarray())
array([[5, 0, 0, 0, 0],
       [0, 5, 5, 0, 0]])
>>>

The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:

col = range(5)
row = [0, 1, 1, 0, 1]
dat = [1, 1, 1, 1, 1]
S = csr_matrix((dat, (row, col)), shape=(2, 5))
result = S * A
# check that the result is another sparse matrix
print type(result)
# check that the values are the ones we want
print result.toarray()

Output:

<class 'scipy.sparse.csr.csr_matrix'>
[[5 0 0 0 0]
 [0 5 5 0 0]]

You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.

The indexing should be:

idx1 = [0, 3]       # rows 1 and 4
idx2 = [1, 2, 4]    # rows 2,3 and 5

Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:

A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
B = vstack((A_sub1, A_sub2))
B.toarray()
array([[5, 0, 0, 0, 0],
       [0, 5, 5, 0, 0]])

Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so @Mr_E's answer is probably better.

Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):

A_sub1 = A[idx1, :].sum(axis=0)
A_sub2 = A[idx2, :].sum(axis=0)
np.vstack((A_sub1, A_sub2))

Giving:

matrix([[5, 0, 0, 0, 0],
        [0, 5, 5, 0, 0]])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!