Scipy sparse matrix assignment using only stored elements

爱⌒轻易说出口 提交于 2019-12-13 07:15:32

问题


I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):

globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid

where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.

The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.

edit: In response to the comments below, here is a example to illustrate what I mean:

First setup the matrices:

M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

m.todense()
matrix([[ 1.,  0.,  0.],
    [ 0.,  1.,  0.],
    [ 0.,  0.,  1.]])

Then assign:

M[1:4, 1:4] = m

Now the result is:

M.todense()
matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  0.,  0.,  2.],
    [ 2.,  0.,  1.,  0.,  2.],
    [ 2.,  0.,  0.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

Whereas I need the result to be:

matrix([[ 2.,  2.,  2.,  2.,  2.],
    [ 2.,  1.,  2.,  2.,  2.],
    [ 2.,  2.,  1.,  2.,  2.],
    [ 2.,  2.,  2.,  1.,  2.],
    [ 2.,  2.,  2.,  2.,  2.]])

回答1:


Should this line

The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

be changed to?

The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?

Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.

Let's try this with real matrices.

In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]: 
<10x10 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)

M does not have any unnecessary 0's.

If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them

M.tocsr().tolil()

csr format also has an inplace .eliminate_zeros() method.


So your concern is with over writing the nonzeros of the target array.

With dense arrays, the use of nonzero (or where) takes care of this:

In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]: 
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])

Trying the sparse equivalent:

In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]: 
<10x10 sparse matrix of type '<class 'numpy.int32'>'
    with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]: 
<3x3 sparse matrix of type '<class 'numpy.float64'>'
    with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable

I could have used the sparse matrix own nonzero.

In [106]: I,J=m.nonzero()

For coo format, this is the same as

In [109]: I,J=m.row, m.col 

In which case I can also use the data attribute:

In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]: 
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
       [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)

The code for m.nonzero may be instructive

    A = self.tocoo()
    nz_mask = A.data != 0
    return (A.row[nz_mask],A.col[nz_mask])

So you need to be careful to make sure that index and data attributes of the sparse matrix match.

And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.

Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.




回答2:


I have found a working solution. Instead of using an assignment I loop over the data in the sparse matrix (using M.data and M.rows) and replace the elements one by one.

for idx, row in enumerate(m.rows):
    for idy, col in enumerate(row):
        M[yOffset+col, xOffset+idx] = m.data[idx][idy]

I am still curious though if there is no simpler/faster method to achieve this result.

My implementation of the above answer:

I,J = m.nonzero()
M[I+yOffset,J+xOffset] = m[I,J]
return M'

This was however marginally slower.



来源:https://stackoverflow.com/questions/34719588/scipy-sparse-matrix-assignment-using-only-stored-elements

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!