Efficiently Subtract Vector from Matrix (Scipy)

后端未结

关注

 3  595

野性不改 2020-12-16 05:37

I\'ve got a large matrix stored as a scipy.sparse.csc_matrix and want to subtract a column vector from each one of the columns in the large matrix. This is a pretty common t

3条回答

孤街浪徒 (楼主)

2020-12-16 05:42
For a start what would we do with dense arrays?
```
mat-vec.A # taking advantage of broadcasting
mat-vec.A[:,[0]*3] # explicit broadcasting
mat-vec[:,[0,0,0]] # that also works with csr matrix
```
In https://codereview.stackexchange.com/questions/32664/numpy-scipy-optimization/33566 we found that using as_strided on the mat.indptr vector is the most efficient way of stepping through the rows of a sparse matrix. (The x.rows, x.cols of an lil_matrix are nearly as good. getrow is slow). This function implements such as iteration.
```
def sum(X,v):
    rows, cols = X.shape
    row_start_stop = as_strided(X.indptr, shape=(rows, 2),
                            strides=2*X.indptr.strides)
    for row, (start, stop) in enumerate(row_start_stop):
        data = X.data[start:stop]
        data -= v[row]

sum(mat, vec.A)
print mat.A
```
I'm using vec.A for simplicity. If we keep vec sparse we'd have to add a test for nonzero value at row. Also this type of iteration only modifies the nonzero elements of mat. 0's are unchanged.

I suspect the time advantages will depend a lot on the sparsity of matrix and vector. If vec has lots of zeros, then it makes sense to iterate, modifying only those rows of mat where vec is nonzero. But vec is nearly dense like this example, it may be hard to beat mat-vec.A.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...