Converting from sparse to dense to sparse again decreases density after constructing sparse matrix

问题

I am using scipy to generate a sparse finite difference matrix, constructing it initially from block matrices and then editing the diagonal to account for boundary conditions. The resulting sparse matrix is of the BSR type. I have found that if I convert the matrix to a dense matrix and then back to a sparse matrix using the scipy.sparse.BSR_matrix function, I am left with a sparser matrix than before. Here is the code I use to generate the matrix:

size = (4,4)

xDiff = np.zeros((size[0]+1,size[0]))
ix,jx = np.indices(xDiff.shape)
xDiff[ix==jx] = 1
xDiff[ix==jx+1] = -1

yDiff = np.zeros((size[1]+1,size[1]))
iy,jy = np.indices(yDiff.shape)
yDiff[iy==jy] = 1
yDiff[iy==jy+1] = -1

Ax = sp.sparse.dia_matrix(-np.matmul(np.transpose(xDiff),xDiff))
Ay = sp.sparse.dia_matrix(-np.matmul(np.transpose(yDiff),yDiff))

lap = sp.sparse.kron(sp.sparse.eye(size[1]),Ax) + sp.sparse.kron(Ay,sp.sparse.eye(size[0]))

#set up boundary conditions
BC_diag = np.array([2]+[1]*(size[0]-2)+[2]+([1]+[0]*(size[0]-2)+[1])*(size[1]-2)+[2]+[1]*(size[0]-2)+[2])

lap += sp.sparse.diags(BC_diag)

If I check the sparsity of this matrix I see the following:

lap
<16x16 sparse matrix of type '<class 'numpy.float64'>'
with 160 stored elements (blocksize = 4x4) in Block Sparse Row format>

However, if I convert it to a dense matrix and then back to the same sparse format I see a much sparser matrix:

sp.sparse.bsr_matrix(lap.todense())
<16x16 sparse matrix of type '<class 'numpy.float64'>'
with 64 stored elements (blocksize = 1x1) in Block Sparse Row format>

I suspect that the reason this is happening is because I constructed the matrix using the sparse.kron function but my question is if there is a way to arrive at the smaller sparse matrix without converting to dense first, for example if I end up wanting to simulate a very large domain.

回答1:

BSR stores the data in dense blocks:

In [167]: lap.data.shape                                                        
Out[167]: (10, 4, 4)

In this case those blocks have quite a few zeros.

In [168]: lap1 = lap.tocsr() 
In [170]: lap1                                                                  
Out[170]: 
<16x16 sparse matrix of type '<class 'numpy.float64'>'
    with 160 stored elements in Compressed Sparse Row format>
In [171]: lap1.data                                                             
Out[171]: 
array([-2.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,  1., -3.,  1.,  0.,  0.,
        1.,  0.,  0.,  0.,  1., -3.,  1.,  0.,  0.,  1.,  0.,  0.,  0.,
        1., -2.,  0.,  0.,  0.,  1.,  1.,  0.,  0.,  0., -3.,  1.,  0.,
        0.,  1.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  1., -4.,  1.,  0., 
        ...
        0.,  0.,  1., -2.])

In place cleanup:

In [172]: lap1.eliminate_zeros()                                                
In [173]: lap1                                                                  
Out[173]: 
<16x16 sparse matrix of type '<class 'numpy.float64'>'
    with 64 stored elements in Compressed Sparse Row format>

If I specify the csr format when using kron:

In [181]: lap2 = sparse.kron(np.eye(size[1]),Ax,format='csr') + sparse.kron(Ay,n
     ...: p.eye(size[0]), format='csr')                                         
In [182]: lap2                                                                  
Out[182]: 
<16x16 sparse matrix of type '<class 'numpy.float64'>'
    with 64 stored elements in Compressed Sparse Row format>

回答2:

[I have been informed that my answer is incorrect. The reason, if I understand, is that Scipy is not using Lapack for creating matrices but is using its own code for this purpose. Interesting. The information though unexpected has the ring of authority. I shall defer to it!

[I will leave the answer posted for reference, but no longer assert that the answer were correct.]

Generally speaking, when it comes to complicated data structures like sparse matrices, you have two cases:

the constructor knows the structure's full contents in advance; or
the structure is designed to be built up gradually so that the structure's full contents are known only after the structure is complete.

The classic case of the complicated data structure is the case of the binary tree. You can make a binary tree more efficient by copying it after it is complete. Otherwise, the standard red-black implementation of the tree leaves some search paths as long as twice as long as others—which is usually okay but is not optimal.

Now, you probably knew all that, but I mention it for a reason. Scipy depends on Lapack. Lapack brings several different storage schemes. Two of these are the

general sparse and
banded

schemes. It would appear that Scipy begins by storing your matrix as sparse, where the indices of each nonzero element are explicitly stored; but that, on copy, Scipy notices that the banded representation is the more appropriate—for your matrix is, after all, banded.

来源：https://stackoverflow.com/questions/55169282/converting-from-sparse-to-dense-to-sparse-again-decreases-density-after-construc

标签

python

matrix

scipy

sparse-matrix