问题
Every once in a while, I get to manipulate a csr_matrix but I always forget how the parameters indices and indptr work together to build a sparse matrix.
I am looking for a clear and intuitive explanation on how the indptr interacts with both the data and indices parameters when defining a sparse matrix using the notation csr_matrix((data, indices, indptr), [shape=(M, N)]).
I can see from the scipy documentation that the data parameter contains all the non-zero data, and the indices parameter contains the columns associated to that data (as such, indices is equal to col in the example given in the documentation). But how can we explain in clear terms the indptr parameter?
回答1:
Maybe this explanation can help understand the concept:
datais an array containing all the non zero elements of the sparse matrix.indicesis an array mapping each element indatato its column in the sparse matrix.indptrthen maps the elements ofdataandindicesto the rows of the sparse matrix. This is done with the following reasoning:- If the sparse matrix has M rows,
indptris an array containing M+1 elements - for row i,
[indptr[i]:indptr[i+1]]returns the indices of elements to take fromdataandindicescorresponding to row i. So supposeindptr[i]=kandindptr[i+1]=l, the data corresponding to row i would bedata[k:l]at columnsindices[k:l]. This is the tricky part, and I hope the following example helps understanding it.
- If the sparse matrix has M rows,
EDIT : I replaced the numbers in data by letters to avoid confusion in the following example.
Note: the values in indptr are necessarily increasing, because the next cell in indptr (the next row) is referring to the next values in data and indices corresponding to that row.
回答2:
Sure, the elements inside indptr are in ascending order. But how to explain the indptr behavior? In short words, until the element inside indptr is the same or doesn't increase, you can skip row index of the sparse matrix.
The following example illustrates the above interpretation of indptr elements:
Example 1) imagine this matrix:
array([[0, 1, 0],
[8, 0, 0],
[0, 0, 0],
[0, 0, 0],
[0, 0, 7]])
mat1 = csr_matrix(([1,8,7], [1,0,2], [0,1,2,2,2,3]), shape=(5,3))
mat1.indptr
# array([0, 1, 2, 2, 2, 3], dtype=int32)
mat1.todense() # to get the corresponding sparse matrix
Example 2) Array to CSR_matrix (the case when the sparse matrix already exists):
arr = np.array([[0, 0, 0],
[8, 0, 0],
[0, 5, 4],
[0, 0, 0],
[0, 0, 7]])
mat2 = csr_matrix(arr))
mat2.indptr
# array([0, 0, 1, 3, 3, 4], dtype=int32)
mat2.indices
# array([0, 1, 2, 2], dtype=int32)
mat.data
# array([8, 5, 4, 7], dtype=int32)
来源:https://stackoverflow.com/questions/52299420/scipy-csr-matrix-understand-indptr