Efficiently accumulating a collection of sparse scipy matrices

断了今生、忘了曾经 提交于 2019-12-04 04:26:29

I think I've found a way to speed it up by a factor of ~10 if your matrices are very sparse.

In [1]: from scipy.sparse import csr_matrix

In [2]: def sum_sparse(m):
   ...:     x = np.zeros(m[0].shape)
   ...:     for a in m:
   ...:         ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr))
   ...:         x[ri,a.indices] += a.data
   ...:     return x
   ...: 

In [6]: m = [np.zeros((100,100)) for i in range(1000)]

In [7]: for x in m:
   ...:     x.ravel()[np.random.randint(0,x.size,10)] = 1.0
   ...:     

        m = [csr_matrix(x) for x in m]

In [17]: (sum(m[1:],m[0]).todense() == sum_sparse(m)).all()
Out[17]: True

In [18]: %timeit sum(m[1:],m[0]).todense()
10 loops, best of 3: 145 ms per loop

In [19]: %timeit sum_sparse(m)
100 loops, best of 3: 18.5 ms per loop

@user545424 has already posted what will likely be the fastest solution. Something in the same spirit that is more readable and ~same speed... nonzero() has all kinds of useful applications.

def sum_sparse(m):
        x = np.zeros(m[0].shape,m[0].dtype)
        for a in m:
            # old lines
            #ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr))
            #x[ri,a.indices] += a.data
            # new line
            x[a.nonzero()] += a.data
        return x

Can't you just add them together before converting to a dense matrix?

>>> sum(my_sparse_matrices[1:],my_sparse_matrices[0]).todense()

This is not a complete answer (and I too would like to see a more detailed response), but you can gain an easy factor of two or more improvement by not creating intermediate results:

def sum_dense():
    return sum([m.toarray() for m in M])

def sum_dense2():
    return sum((m.toarray() for m in M))

On my machine (YMMV), this results in being the fastest computation. By placing the summation in a () rather than a [], we construct a generator rather than the whole list before the summation.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!