Sum over rows in scipy.sparse.csr_matrix

后端 未结 2 1302
长发绾君心
长发绾君心 2020-12-11 10:43

I have a big csr_matrix and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. (Context: The matrix is a documen

相关标签:
2条回答
  • 2020-12-11 10:56

    Note that you can do this by carefully constructing another matrix. Here's how it would work for a dense matrix:

    >>> S = np.array([[1, 0, 0, 1, 0,], [0, 1, 1, 0, 1]])
    >>> np.dot(S, A.toarray())
    array([[5, 0, 0, 0, 0],
           [0, 5, 5, 0, 0]])
    >>>
    

    The sparse version is only a little more complicated. The information about which rows should be summed together is encoded in row:

    col = range(5)
    row = [0, 1, 1, 0, 1]
    dat = [1, 1, 1, 1, 1]
    S = csr_matrix((dat, (row, col)), shape=(2, 5))
    result = S * A
    # check that the result is another sparse matrix
    print type(result)
    # check that the values are the ones we want
    print result.toarray()
    

    Output:

    <class 'scipy.sparse.csr.csr_matrix'>
    [[5 0 0 0 0]
     [0 5 5 0 0]]
    

    You can handle more rows in your output by including higher values in row and extending the shape of S accordingly.

    0 讨论(0)
  • 2020-12-11 11:09

    The indexing should be:

    idx1 = [0, 3]       # rows 1 and 4
    idx2 = [1, 2, 4]    # rows 2,3 and 5
    

    Then you need to keep A_sub1 and A_sub2 in sparse format and use axis=0:

    A_sub1 = csr_matrix(A[idx1, :].sum(axis=0))
    A_sub2 = csr_matrix(A[idx2, :].sum(axis=0))
    B = vstack((A_sub1, A_sub2))
    B.toarray()
    array([[5, 0, 0, 0, 0],
           [0, 5, 5, 0, 0]])
    

    Note, I think the A[idx, :].sum(axis=0) operations involve conversion from sparse matrices - so @Mr_E's answer is probably better.

    Alternatively, it works when you use axis=0 and np.vstack (as opposed to scipy.sparse.vstack):

    A_sub1 = A[idx1, :].sum(axis=0)
    A_sub2 = A[idx2, :].sum(axis=0)
    np.vstack((A_sub1, A_sub2))
    

    Giving:

    matrix([[5, 0, 0, 0, 0],
            [0, 5, 5, 0, 0]])
    
    0 讨论(0)
提交回复
热议问题