问题
I have an extremely sparse structured matrix. My matrix has exactly one non zero entry per column. But its huge(10k*1M) and given in following form(uisng random values for example)
rows = np.random.randint(0, 10000, 1000000)
values = np.random.randint(0,10,1000000)
where rows gives us the row number for nonzero entry in each column. I want fast matrix multiplication with S and I am doing following right now - I convert this form to a sparse matrix(S) and do S.dot(X) for multiplication with matrix X(which can be sparse or dense).
S=scipy.sparse.csr_matrix( (values, (rows, scipy.arange(1000000))), shape = (10000,1000000))
For X of size 1M * 2500 and nnz(X)=8M this takes 178ms to create S and 255 ms to apply it. So my question is this what is the best way of doing SX (where X could be sparse or dense) given my S is as described. Since creating S is itself very time consuming I was thinking of something adhoc. I did try creating something using loops but its not even close.
Simple looping procedure looks something like this
SX = np.zeros((rows.size,X.shape[1]))
for i in range(X.shape[0]):
SX[rows[i],:]+=values[i]*X[i,:]
return SX
Can we make this efficient?
Any suggestions are greatly appreciated. Thanks
回答1:
Approach #1
Given that there's exactly one entry per column in the first input, we could use np.bincount
using inputs - rows
, values
and X
and thus also avoids creating sparse matrix S
-
def sparse_matrix_mult(rows, values, X):
nrows = rows.max()+1
ncols = X.shape[1]
nelem = nrows * ncols
ids = rows + nrows*np.arange(ncols)[:,None]
sums = np.bincount(ids.ravel(), (X.T*values).ravel(), minlength=nelem)
out = sums.reshape(ncols,-1).T
return out
Sample run -
In [746]: import numpy as np
...: from scipy.sparse import csr_matrix
...: import scipy as sp
...:
In [747]: np.random.seed(1234)
...: m,n = 3,4
...: rows = np.random.randint(0, m, n)
...: values = np.random.randint(2,10,n)
...: X = np.random.randint(2, 10, (n,5))
...:
In [748]: S = csr_matrix( (values, (rows, sp.arange(n))), shape = (m,n))
In [749]: S.dot(X)
Out[749]:
array([[42, 27, 45, 78, 87],
[24, 18, 18, 12, 24],
[18, 6, 8, 16, 10]])
In [750]: sparse_matrix_mult(rows, values, X)
Out[750]:
array([[ 42., 27., 45., 78., 87.],
[ 24., 18., 18., 12., 24.],
[ 18., 6., 8., 16., 10.]])
Approach #2
Using np.add.reduceat
to replace np.bincount
-
def sparse_matrix_mult_v2(rows, values, X):
nrows = rows.max()+1
ncols = X.shape[1]
scaled_ar = X*values[:,None]
sidx = rows.argsort()
rows_s = rows[sidx]
cut_idx = np.concatenate(([0],np.flatnonzero(rows_s[1:] != rows_s[:-1])+1))
sums = np.add.reduceat(scaled_ar[sidx],cut_idx,axis=0)
out = np.empty((nrows, ncols),dtype=sums.dtype)
row_idx = rows_s[cut_idx]
out[row_idx] = sums
return out
Runtime test
I could not run it on the sizes mentioned in the question, as those were too big for me to handle. So, on reduced datasets, here's what I am getting -
In [149]: m,n = 1000, 100000
...: rows = np.random.randint(0, m, n)
...: values = np.random.randint(2,10,n)
...: X = np.random.randint(2, 10, (n,2500))
...:
In [150]: S = csr_matrix( (values, (rows, sp.arange(n))), shape = (m,n))
In [151]: %timeit csr_matrix( (values, (rows, sp.arange(n))), shape = (m,n))
100 loops, best of 3: 16.1 ms per loop
In [152]: %timeit S.dot(X)
1 loop, best of 3: 193 ms per loop
In [153]: %timeit sparse_matrix_mult(rows, values, X)
1 loop, best of 3: 4.4 s per loop
In [154]: %timeit sparse_matrix_mult_v2(rows, values, X)
1 loop, best of 3: 2.81 s per loop
So, the proposed methods don't seem to over-power numpy.dot
on performance, but they should be good on memory efficiency.
For sparse X
For sparse X
, we need some modifications, as listed in the modified method listed below -
from scipy.sparse import find
def sparse_matrix_mult_sparseX(rows, values, Xs): # Xs is sparse
nrows = rows.max()+1
ncols = Xs.shape[1]
nelem = nrows * ncols
scaled_vals = Xs.multiply(values[:,None])
r,c,v = find(scaled_vals)
ids = rows[r] + c*nrows
sums = np.bincount(ids, v, minlength=nelem)
out = sums.reshape(ncols,-1).T
return out
回答2:
Inspired by this post Fastest way to sum over rows of sparse matrix, I have found the best way to do this is to write loops and port things to numba. Here is the
`
@njit
def sparse_mul(SX,row,col,data,values,row_map):
N = len(data)
for idx in range(N):
SX[row_map[row[idx]],col[idx]]+=data[idx]*values[row[idx]]
return SX
X_coo=X.tocoo()
s=row_map.max()+1
SX = np.zeros((s,X.shape[1]))
sparse_mul(SX,X_coo.row,X_coo.col,X_coo.data,values,row_map)`
Here row_map is the rows in the question. On X of size (1M* 1K), 1% sparsity and with s=10K this performs twice as good as forming sparse matrix from row_map and doing S.dot(A).
回答3:
As I recall, Knuth TAOP talks about representing a sparse matrix instead as a linked list of (for your app) non-zero values. Maybe something like that? Then traverse the linked list rather than the entire array by each dimension.
来源:https://stackoverflow.com/questions/46456396/what-is-the-fastest-way-to-multiply-with-extremely-sparse-matrix