creating a scipy.lil_matrix using a python generator efficiently

匿名 (未验证) 提交于 2019-12-03 09:19:38

问题:

I have a generator that generates single dimension numpy.arrays of the same length. I would like to have a sparse matrix containing that data. Rows are generated in the same order I'd like to have them in the final matrix. csr matrix is preferable over lil matrix, but I assume the latter will be easier to build in the scenario I'm describing.

Assuming row_gen is a generator yielding numpy.array rows, the following code works as expected.

def row_gen():     yield numpy.array([1, 2, 3])     yield numpy.array([1, 0, 1])     yield numpy.array([1, 0, 0])  matrix = scipy.sparse.lil_matrix(list(row_gen())) 

Because the list will essentially ruin any advantages of the generator, I'd like the following to have the same end result. More specifically, I cannot hold the entire dense matrix (or a list of all matrix rows) in memory:

def row_gen():     yield numpy.array([1, 2, 3])     yield numpy.array([1, 0, 1])     yield numpy.array([1, 0, 0])  matrix = scipy.sparse.lil_matrix(row_gen()) 

However it raises the following exception when run:

TypeError: no supported conversion for types: (dtype('O'),) 

I also noticed the trace includes the following:

File "/usr/local/lib/python2.7/site-packages/scipy/sparse/lil.py", line 122, in __init__   A = csr_matrix(A, dtype=dtype).tolil() 

Which makes me think using scipy.sparse.lil_matrix will end up creating a csr matrix and only then convert that to a lil matrix. In that case I would rather just create csr matrix to begin with.

To recap, my question is: What is the most efficient way to create a scipy.sparse matrix from a python generator or numpy single dimensional arrays?

回答1:

Let's look at the code for sparse.lil_matrix. It checks the first argument:

if isspmatrix(arg1):    # is is already a sparse matrix      ... elif isinstance(arg1,tuple):    # is it the shape tuple     if isshape(arg1):         if shape is not None:             raise ValueError('invalid use of shape parameter')         M, N = arg1         self.shape = (M,N)         self.rows = np.empty((M,), dtype=object)         self.data = np.empty((M,), dtype=object)         for i in range(M):             self.rows[i] = []             self.data[i] = []     else:         raise TypeError('unrecognized lil_matrix constructor usage') else:     # assume A is dense     try:         A = np.asmatrix(arg1)     except TypeError:         raise TypeError('unsupported matrix type')     else:         from .csr import csr_matrix         A = csr_matrix(A, dtype=dtype).tolil()          self.shape = A.shape         self.dtype = A.dtype         self.rows = A.rows         self.data = A.data 

As per the documentation - you can construct it from another sparse matrix, from a shape, and from a dense array. The dense array constructor first makes a csr matrix, and then converts it to lil.

The shape version constructs an empty lil with data like:

In [161]: M=sparse.lil_matrix((3,5),dtype=int) In [163]: M.data Out[163]: array([[], [], []], dtype=object) In [164]: M.rows Out[164]: array([[], [], []], dtype=object) 

It should be obvious that passing a generator isn't going work - it isn't a dense array.

But having created a lil matrix, you can fill in elements with a regular array assignment:

In [167]: M[0,:]=[1,0,2,0,0] In [168]: M[1,:]=[0,0,2,0,0] In [169]: M[2,3:]=[1,1] In [170]: M.data Out[170]: array([[1, 2], [2], [1, 1]], dtype=object) In [171]: M.rows Out[171]: array([[0, 2], [2], [3, 4]], dtype=object) In [172]: M.A Out[172]:  array([[1, 0, 2, 0, 0],        [0, 0, 2, 0, 0],        [0, 0, 0, 1, 1]]) 

and you can assign values to the sublists directly (I think this is faster, but a little more dangerous):

In [173]: M.data[1]=[1,2,3] In [174]: M.rows[1]=[0,2,4] In [176]: M.A Out[176]:  array([[1, 0, 2, 0, 0],        [1, 0, 2, 0, 3],        [0, 0, 0, 1, 1]]) 

Another incremental approach is to construct the 3 arrays or lists of coo format, and then make a coo or csr from those.

sparse.bmat is another option, and its code is a good example of building the coo inputs. I'll let you look at that yourself.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!