I am creating some numpy arrays with word counts in Python: rows are documents, columns are counts for word X. If I have a lot of zero counts, people suggest using sparse matric
@hpaulj Your timeit is wrong, u are getting slow results cause of mapping sparse.random to numpy array (its slowish) with that in mind:
M=sparse.random(1000,1000,.5)
Ma=M.toarray()
%timeit -n 25 M1=M*M
352 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 25 loops each)
%timeit -n 25 M2=Ma.dot(Ma)
13.5 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 25 loops each)
To get close to numpy we need to have
M=sparse.random(1000,1000,.03)
%timeit -n 25 M1=M*M
10.7 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 25 loops each)
%timeit -n 25 M2=Ma.dot(Ma)
11.4 ms ± 564 µs per loop (mean ± std. dev. of 7 runs, 25 loops each)