Using a sparse matrix versus numpy array

后端 未结 3 1847
Happy的楠姐
Happy的楠姐 2021-02-01 03:23

I am creating some numpy arrays with word counts in Python: rows are documents, columns are counts for word X. If I have a lot of zero counts, people suggest using sparse matric

3条回答
  •  终归单人心
    2021-02-01 04:06

    @hpaulj Your timeit is wrong, u are getting slow results cause of mapping sparse.random to numpy array (its slowish) with that in mind:

    M=sparse.random(1000,1000,.5)
    Ma=M.toarray()
    
    %timeit -n 25 M1=M*M
    352 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 25 loops each)
    
    %timeit -n 25 M2=Ma.dot(Ma)
    13.5 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 25 loops each)
    

    To get close to numpy we need to have

    M=sparse.random(1000,1000,.03)
    
    %timeit -n 25 M1=M*M
    10.7 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 25 loops each)
    
    %timeit -n 25 M2=Ma.dot(Ma)
    11.4 ms ± 564 µs per loop (mean ± std. dev. of 7 runs, 25 loops each)
    
    
    

提交回复
热议问题