Efficient incremental sparse matrix in python / scipy / numpy

烈酒焚心 提交于 2019-12-13 15:22:30

问题


Is there a way in Python to have an efficient incremental update of sparse matrix?

 H = lil_matrix((n,m))
 for (i,j) in zip(A,B):
   h(i,j) += compute_something  

It seems that such a way to build a sparse matrix is quite slow (lil_matrix is the fastest sparse matrix type for that).

Is there a way (like using dict of dict or other kind of approaches) to efficiently build the sparse matrix H?


回答1:


In https://stackoverflow.com/a/27771335/901925 I explore incremental matrix assignment.

lol and dok are the recommended formats if you want to change values. csr will give you an efficiency warning, and coo does not allow indexing.

But I also found that dok indexing is slow compared to regular dictionary indexing. So for many changes it is better to build a plain dictionary (with the same tuple indexing), and build the dok matrix from that.

But if you can calculate the H data values with a fast numpy vector operation, as opposed to iteration, it is best to do so, and construct the sparse matrix from that (e.g. coo format). In fact even with iteration this would be faster:

 h = np.zeros(A.shape)
 for k, (i,j) in enumerate(zip(A,B)):
    h[k] = compute_something 
 H = sparse.coo_matrix((h, (A, B)), shape=(n,m))

e.g.

In [780]: A=np.array([0,1,1,2]); B=np.array([0,2,2,1])
In [781]: h=np.zeros(A.shape)
In [782]: for k, (i,j) in enumerate(zip(A,B)):
    h[k] = i+j+k
   .....:     
In [783]: h
Out[783]: array([ 0.,  4.,  5.,  6.])
In [784]: M=sparse.coo_matrix((h,(A,B)),shape=(4,4))
In [785]: M
Out[785]: 
<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in COOrdinate format>
In [786]: M.A
Out[786]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  9.,  0.],
       [ 0.,  6.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Note that the (1,2) value is the sum 4+5. That's part of the coo to csr conversion.

In this case I could have calculated h with:

In [791]: A+B+np.arange(A.shape[0])
Out[791]: array([0, 4, 5, 6])

so there's no need for iteration.




回答2:


Nope, do not use csr_matrix or csc_matrix, as they are going to be even more slower than lil_matrix, if you construct them incrementally. The Dictionary of Key based sparse matrix is exactly what you are looking for

from scipy.sparse import dok_matrix
S = dok_matrix((5, 5), dtype=np.float32)
for i in range(5):
    for j in range(5):
        S[i,j] = i+j    # Update elements



回答3:


A faster way would be:

H_ij = compute_something_vectorized()
H = coo_matrix((H_ij, (A, B))).tocsr()

The data for duplicate coordinates are then summed, see the docs for coo_matrix.



来源:https://stackoverflow.com/questions/35773101/efficient-incremental-sparse-matrix-in-python-scipy-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!