I\'ve got a large matrix stored as a scipy.sparse.csc_matrix and want to subtract a column vector from each one of the columns in the large matrix. This is a pretty common t
For a start what would we do with dense arrays?
mat-vec.A # taking advantage of broadcasting
mat-vec.A[:,[0]*3] # explicit broadcasting
mat-vec[:,[0,0,0]] # that also works with csr matrix
In https://codereview.stackexchange.com/questions/32664/numpy-scipy-optimization/33566
we found that using as_strided on the mat.indptr vector is the most efficient way of stepping through the rows of a sparse matrix. (The x.rows, x.cols of an lil_matrix are nearly as good. getrow is slow). This function implements such as iteration.
def sum(X,v):
rows, cols = X.shape
row_start_stop = as_strided(X.indptr, shape=(rows, 2),
strides=2*X.indptr.strides)
for row, (start, stop) in enumerate(row_start_stop):
data = X.data[start:stop]
data -= v[row]
sum(mat, vec.A)
print mat.A
I'm using vec.A for simplicity. If we keep vec sparse we'd have to add a test for nonzero value at row. Also this type of iteration only modifies the nonzero elements of mat. 0's are unchanged.
I suspect the time advantages will depend a lot on the sparsity of matrix and vector. If vec has lots of zeros, then it makes sense to iterate, modifying only those rows of mat where vec is nonzero. But vec is nearly dense like this example, it may be hard to beat mat-vec.A.