问题
Is there a way in Python to have an efficient incremental update of sparse matrix?
H = lil_matrix((n,m))
for (i,j) in zip(A,B):
h(i,j) += compute_something
It seems that such a way to build a sparse matrix is quite slow (lil_matrix
is the fastest sparse matrix type for that).
Is there a way (like using dict of dict or other kind of approaches) to efficiently build the sparse matrix H?
回答1:
In https://stackoverflow.com/a/27771335/901925 I explore incremental matrix assignment.
lol
and dok
are the recommended formats if you want to change values. csr
will give you an efficiency warning, and coo
does not allow indexing.
But I also found that dok
indexing is slow compared to regular dictionary indexing. So for many changes it is better to build a plain dictionary (with the same tuple indexing), and build the dok
matrix from that.
But if you can calculate the H
data values with a fast numpy
vector operation, as opposed to iteration, it is best to do so, and construct the sparse matrix from that (e.g. coo
format). In fact even with iteration this would be faster:
h = np.zeros(A.shape)
for k, (i,j) in enumerate(zip(A,B)):
h[k] = compute_something
H = sparse.coo_matrix((h, (A, B)), shape=(n,m))
e.g.
In [780]: A=np.array([0,1,1,2]); B=np.array([0,2,2,1])
In [781]: h=np.zeros(A.shape)
In [782]: for k, (i,j) in enumerate(zip(A,B)):
h[k] = i+j+k
.....:
In [783]: h
Out[783]: array([ 0., 4., 5., 6.])
In [784]: M=sparse.coo_matrix((h,(A,B)),shape=(4,4))
In [785]: M
Out[785]:
<4x4 sparse matrix of type '<class 'numpy.float64'>'
with 4 stored elements in COOrdinate format>
In [786]: M.A
Out[786]:
array([[ 0., 0., 0., 0.],
[ 0., 0., 9., 0.],
[ 0., 6., 0., 0.],
[ 0., 0., 0., 0.]])
Note that the (1,2) value is the sum 4+5. That's part of the coo
to csr
conversion.
In this case I could have calculated h
with:
In [791]: A+B+np.arange(A.shape[0])
Out[791]: array([0, 4, 5, 6])
so there's no need for iteration.
回答2:
Nope, do not use csr_matrix
or csc_matrix
, as they are going to be even more slower than lil_matrix
, if you construct them incrementally. The Dictionary of Key based sparse matrix is exactly what you are looking for
from scipy.sparse import dok_matrix
S = dok_matrix((5, 5), dtype=np.float32)
for i in range(5):
for j in range(5):
S[i,j] = i+j # Update elements
回答3:
A faster way would be:
H_ij = compute_something_vectorized()
H = coo_matrix((H_ij, (A, B))).tocsr()
The data for duplicate coordinates are then summed, see the docs for coo_matrix.
来源:https://stackoverflow.com/questions/35773101/efficient-incremental-sparse-matrix-in-python-scipy-numpy