putting column into empty sparse matrix

江枫思渺然 提交于 2020-02-03 00:05:24

问题


I want to put a column from one sparse columnar matrix into another (empty) sparse columnar matrix. Toy code:

import numpy as np
import scipy.sparse
row = np.array([0, 2, 0, 1, 2])
col = np.array([0, 0, 2, 2, 2])
data = np.array([1, 2, 4, 5, 6])
M=scipy.sparse.csc_matrix((data, (row, col)), shape=(3, 3))
E=scipy.sparse.csc_matrix((3, 3)) #empty 3x3 sparse matrix

E[:,1]=M[:,0]

However I get the warning:

SparseEfficiencyWarning: Changing the sparsity structure of a csc_matrix is >expensive. lil_matrix is more efficient.

This warning makes me fear that in the process the matrix is converted to another format and then back to csc, which is not efficient. Can anyone confirm this and have a solution?


回答1:


The warning is telling you that the process of setting new values in a csc (or csr) format matrix is complicated. Those formats aren't designed for easy changes like this. The lil format is designed to make that kind of change quick and easy, especially making changes in one row.

Note that the coo format doesn't even implement this kind of indexing.

It isn't converting to lil and back, but that might actually be a faster way. We'd have to do some time tests.

In [679]: %%timeit E=sparse.csr_matrix((3,3))
     ...: E[:,1] = M[:,0]
     ...: 
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:730: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
  SparseEfficiencyWarning)
1000 loops, best of 3: 845 µs per loop
In [680]: %%timeit E=sparse.csr_matrix((3,3))
     ...: E1=E.tolil()
     ...: E1[:,1] = M[:,0]
     ...: E=E1.tocsc()
     ...: 
The slowest run took 4.22 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.42 ms per loop

In [682]: %%timeit E=sparse.lil_matrix((3,3))
     ...: E[:,1] = M[:,0]
     ...: 
1000 loops, best of 3: 804 µs per loop
In [683]: %%timeit E=sparse.lil_matrix((3,3));M1=M.tolil()
     ...: E[:,1] = M1[:,0]
     ...: 
     ...: 
1000 loops, best of 3: 470 µs per loop

In [688]: timeit M1=M.tolil()
The slowest run took 4.10 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 248 µs per loop

Notice that doing the assignment with lil (both sides) is 2x faster than doing it with csc. But conversion to/from lil takes up time.

Warning or not, what you are doing is fastest - for a onetime operation. But if you need to do this repeatedly, try to find a better way.

=================

Setting rows v columns doesn't make much difference.

In [835]: %%timeit E=sparse.csc_matrix((3,3))
     ...: E[:,1]=M[:,0]
  SparseEfficiencyWarning)
1000 loops, best of 3: 1.89 ms per loop

In [836]: %%timeit E=sparse.csc_matrix((3,3))
     ...: E[1,:]=M[0,:]    
  SparseEfficiencyWarning)
1000 loops, best of 3: 1.91 ms per loop


来源:https://stackoverflow.com/questions/40722899/putting-column-into-empty-sparse-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!