Fastest way to sum over rows of sparse matrix

强颜欢笑 提交于 2019-12-22 08:26:35

问题


I have a big csr_matrix(1M*1K) and I want to add over rows and obtain a new csr_matrix with the same number of columns but reduced number of rows. Actually my problem is exactly same as this Sum over rows in scipy.sparse.csr_matrix. The only thing is I find the accepted solution to be slow for my purpose. Let me state what I have

map_fn = np.random.randint(0, 10000, 1000000)

map_fn here tells me how my input rows(1M) are mapped into my output rows(10K). For example ith input row gets added up into map_fn[i] output row. I tried the two approaches mentioned in the above question, namely forming a sparse matrix and using sparse sum. Although the sparse matrix approach looks way better than sparse sum approach but I find it slow for my purpose. Here is the code comparing two approaches:

import scipy.sparse
import numpy as np 
import time

print "Setting up input"
s=10000
n=1000000
d=1000
density=1.0/500

X=scipy.sparse.rand(n,d,density=density,format="csr")
map_fn=np.random.randint(0, s, n)

# Approach 1
start_time=time.time()
col = scipy.arange(n)
val = np.ones(n)
S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
print "Approach 1 Creation time : ",time.time()-start_time
SX = S.dot(X)
print "Approach 1 Total time : ",time.time()-start_time

#Approach 2
start_time=time.time()
SX = np.zeros((s,X.shape[1]))
for i in range(SX.shape[0]):
    SX[i,:] = X[np.where(map_fn==i)[0],:].sum(axis=0)

print "Approach 2 Total time : ",time.time()-start_time

which gives following numbers:

Approach 1 Creation time :  0.187678098679
Approach 1 Total time :  0.286989927292
Approach 2 Total time :  10.208632946

So my question is this is there a better way of doing this? I find forming sparse matrix to be an overkill as it takes more than half of the time. Are there any better alternatives? Any suggestions are greatly appreciated. Thanks


回答1:


Starting approach

Adapting sparse solution from this post -

def sparse_matrix_mult_sparseX_mod1(X, rows):   
    nrows = rows.max()+1
    ncols = X.shape[1]
    nelem = nrows * ncols

    a,b = X.nonzero()
    ids = rows[a] + b*nrows
    sums = np.bincount(ids, X[a,b].A1, minlength=nelem)
    out = sums.reshape(ncols,-1).T
    return out

Benchmarking

Original approach #1 -

def app1(X, map_fn):
    col = scipy.arange(n)
    val = np.ones(n)
    S = scipy.sparse.csr_matrix( (val, (map_fn, col)), shape = (s,n))
    SX = S.dot(X)
    return SX

Timings and verification -

In [209]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/500
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [210]: out1 = app1(X, map_fn)
     ...: out2 = sparse_matrix_mult_sparseX_mod1(X, map_fn)
     ...: print np.allclose(out1.toarray(), out2)
     ...: 
True

In [211]: %timeit app1(X, map_fn)
1 loop, best of 3: 517 ms per loop

In [212]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
10 loops, best of 3: 147 ms per loop

To be fair, we should time the final dense array version from app1 -

In [214]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 584 ms per loop

Porting to Numba

We could translate the binned counting step to numba, which might be beneficial for denser input matrices. One of the ways to do so would be -

from numba import njit

@njit
def bincount_mod2(out, rows, r, C, V):
    N = len(V)
    for i in range(N):
        out[rows[r[i]], C[i]] += V[i]
    return out

def sparse_matrix_mult_sparseX_mod2(X, rows):
    nrows = rows.max()+1
    ncols = X.shape[1]
    r,C = X.nonzero()

    V = X[r,C].A1
    out = np.zeros((nrows, ncols))
    return bincount_mod2(out, rows, r, C, V)

Timings -

In [373]: # Inputs setup
     ...: s=10000
     ...: n=1000000
     ...: d=1000
     ...: density=1.0/100 # Denser now!
     ...: 
     ...: X=scipy.sparse.rand(n,d,density=density,format="csr")
     ...: map_fn=np.random.randint(0, s, n)
     ...: 

In [374]: %timeit app1(X, map_fn)
1 loop, best of 3: 787 ms per loop

In [375]: %timeit sparse_matrix_mult_sparseX_mod1(X, map_fn)
1 loop, best of 3: 906 ms per loop

In [376]: %timeit sparse_matrix_mult_sparseX_mod2(X, map_fn)
1 loop, best of 3: 705 ms per loop

With the dense output from app1 -

In [379]: %timeit app1(X, map_fn).toarray()
1 loop, best of 3: 910 ms per loop


来源:https://stackoverflow.com/questions/46475895/fastest-way-to-sum-over-rows-of-sparse-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!