问题
I am looking for an efficient way to multiply a dense matrix by a sparse vector, Av
, where A is of size (M x N) and v is (N x 1). The vector v is a scipy.sparse.csc_matrix.
I have two methods I use at the moment:
In method 1, I pick off the non-zero values in v, say vi, and element-wise multiply vi with the corresponding column of A, then sum up these columns. So if y = Av
, then y = A[:, 0]*v0 + ... + A[:, N]*vN
, only for the non-zero i.
def dense_dot_sparse(dense_matrix, sparse_column):
prod = np.zeros((dense_matrix.shape[0]))
r, c = sparse_column.nonzero()
indices = zip(r, c)
for ind in indices:
prod = prod + dense_matrix[:, ind[1]] * sparse_column[ind]
return prod
In method 2, I perform the multiplication by simply making the sparse vector .todense()
and use np.dot()
.
def dense_dot_sparse2(dense_matrix, sparse_column):
return np.dot(dense_matrix, sparse_column.todense())
The typical size of A is (512 x 2048) and the sparsity of v varies between 1 to 200 non-zero entries. I choose which method to employ based on the sparsity of v. If the sparsity of v is ~ 200 non-zeros, method 1 takes ~45ms and method 2 takes ~5ms. But when v is very sparse, ~1 non-zero, then method 1 takes ~1ms whereas method 2 still takes 5ms. Checking the sparsity of v (.nnz
) adds nearly another 0.2ms.
I have to perform about 1500 of these multiplications (after splitting up my data and multiprocessing), so the time adds up.
[EDIT: Adding a simple representative example
rows = 512
cols = 2048
sparsity = 0.001 # very sparse: 0.001 for ~ 1 non-zero, moderately sparse: 0.1 for ~ 200 non-zero
big_matrix = np.random.rand(rows, cols) # use as dense matrix
col = np.random.rand(cols, 1)
col = np.array([i[0] if i < sparsity else 0.0 for i in col])
sparse_col = csc_matrix(col) # use as sparse vector
print sparse_col.nnz
END EDIT]
I am looking for a single implementation that is fast for both very sparse and moderately sparse v.
来源:https://stackoverflow.com/questions/29871460/efficiently-multiply-a-dense-matrix-by-a-sparse-vector