numpy: efficiently summing with index arrays

后端 未结 3 605
失恋的感觉
失恋的感觉 2020-11-28 13:49

Suppose I have 2 matrices M and N (both have > 1 columns). I also have an index matrix I with 2 columns -- 1 for M and one for N. The indices for N are unique, but the indic

3条回答
  •  谎友^
    谎友^ (楼主)
    2020-11-28 14:19

    For clarity, let's define

    >>> m_ind, n_ind = w.T
    

    Then the for loop

    for i, j in zip(m_ind, n_ind):
        M[i] += N[j]
    

    updates the entries M[np.unique(m_ind)]. The values that get written to it are N[n_ind], which must be grouped by m_ind. (The fact that there's an n_ind in addition to m_ind is actually tangential to the question; you could just set N = N[n_ind].) There happens to be a SciPy class that does exactly this: scipy.sparse.csr_matrix.

    Example data:

    >>> m_ind, n_ind = array([[0, 0, 1, 1], [2, 3, 0, 1]])
    >>> M = np.arange(2, 6)
    >>> N = np.logspace(2, 5, 4)
    

    The result of the for loop is that M becomes [110002 1103 4 5]. We get the same result with a csr_matrix as follows. As I said earlier, n_ind isn't relevant, so we get rid of that first.

    >>> N = N[n_ind]
    >>> from scipy.sparse import csr_matrix
    >>> update = csr_matrix((N, m_ind, [0, len(N)])).toarray()
    

    The CSR constructor builds a matrix with the required values at the required indices; the third part of its argument is a compressed column index, meaning that the values N[0:len(N)] have the indices m_ind[0:len(N)]. Duplicates are summed:

    >>> update
    array([[ 110000.,    1100.]])
    

    This has shape (1, len(np.unique(m_ind))) and can be added in directly:

    >>> M[np.unique(m_ind)] += update.ravel()
    >>> M
    array([110002,   1103,      4,      5])
    

提交回复
热议问题