numpy: efficiently summing with index arrays

后端未结

关注

 3  605

失恋的感觉 2020-11-28 13:49

Suppose I have 2 matrices M and N (both have > 1 columns). I also have an index matrix I with 2 columns -- 1 for M and one for N. The indices for N are unique, but the indic

3条回答

谎友^ (楼主)

2020-11-28 14:19
For clarity, let's define
```
>>> m_ind, n_ind = w.T
```
Then the for loop
```
for i, j in zip(m_ind, n_ind):
    M[i] += N[j]
```
updates the entries M[np.unique(m_ind)]. The values that get written to it are N[n_ind], which must be grouped by m_ind. (The fact that there's an n_ind in addition to m_ind is actually tangential to the question; you could just set N = N[n_ind].) There happens to be a SciPy class that does exactly this: scipy.sparse.csr_matrix.

Example data:
```
>>> m_ind, n_ind = array([[0, 0, 1, 1], [2, 3, 0, 1]])
>>> M = np.arange(2, 6)
>>> N = np.logspace(2, 5, 4)
```
The result of the for loop is that M becomes [110002 1103 4 5]. We get the same result with a csr_matrix as follows. As I said earlier, n_ind isn't relevant, so we get rid of that first.
```
>>> N = N[n_ind]
>>> from scipy.sparse import csr_matrix
>>> update = csr_matrix((N, m_ind, [0, len(N)])).toarray()
```
The CSR constructor builds a matrix with the required values at the required indices; the third part of its argument is a compressed column index, meaning that the values N[0:len(N)] have the indices m_ind[0:len(N)]. Duplicates are summed:
```
>>> update
array([[ 110000.,    1100.]])
```
This has shape (1, len(np.unique(m_ind))) and can be added in directly:
```
>>> M[np.unique(m_ind)] += update.ravel()
>>> M
array([110002,   1103,      4,      5])
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...