why is row indexing of scipy csr matrices slower compared to numpy arrays

可紊 提交于 2019-12-19 04:42:55

问题


I'm not sure what I am doing wrong but it seems that row indexing a scipy csr_matrix is approximately 2 folds slower compared to numpy arrays (see code below).

Shouldn't row indexing of csr matrices be faster than dense matrices because only few non-zero elements are extracted like in the case below ?

Are there tricks to make row indexing faster for scipy csr matrices ?

import numpy as np
import timeit
from scipy.sparse import csr_matrix

# Generate random matrix
A = np.random.rand(5000, 1000)

# Make A sparse
A[:, 4:] =0

# Create sparse matrix
A_sparse = csr_matrix(A)

# Create row indexing functions
def index_A_dense():
    A[4]

def index_A_dense_copy():
    a = A[4].copy()

def index_A_sparse():
    A_sparse[4]

timeit.timeit(index_A_sparse, number=10000)
Out: 0.6668063667304978
timeit.timeit(index_A_dense, number=10000)
Out: 0.0029007720424942818
timeit.timeit(index_A_dense_copy, number=10000)
Out: 0.00979283193282754

Thanks in advance!


回答1:


The short answer that I demonstrate below is that constructing a new sparse matrix is expensive. There's a significant overhead that is not dependent on the number of rows or the number on nonzero elements in a particular row.


Data representation for sparse matrices is quite different from that for dense array. Arrays store the data in one one contiguous buffer, and efficiently use the shape and strides to iterate over selected values. Those values, plus the index, define exactly were in the buffer it will find the data. Copying those N bytes from location to another is a relatively minor part of the whole operation.

A sparse matrix stores the data in several arrays (or other structures), containing the indexes and the data. Selecting a row then requires looking up the relevant indices, and constructing a new sparse matrix with selected indices and data. There is compiled code in the sparse package, but not nearly as much low level code as with numpy arrays.

To illustrate I'll make small matrix, and not so dense, so we don't have a lot of empty rows:

In [259]: A = (sparse.rand(5,5,.4,'csr')*20).floor()
In [260]: A
Out[260]: 
<5x5 sparse matrix of type '<class 'numpy.float64'>'
    with 10 stored elements in Compressed Sparse Row format>

The dense equivalent, and a row copy:

In [262]: Ad=A.A
In [263]: Ad
Out[263]: 
array([[  0.,   0.,   0.,   0.,  10.],
       [  0.,   0.,   0.,   0.,   0.],
       [ 17.,  16.,  14.,  19.,   6.],
       [  0.,   0.,   1.,   0.,   0.],
       [ 14.,   0.,   9.,   0.,   0.]])
In [264]: Ad[4,:]
Out[264]: array([ 14.,   0.,   9.,   0.,   0.])
In [265]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.58 µs per loop

A matrix row:

In [266]: A[4,:]
Out[266]: 
<1x5 sparse matrix of type '<class 'numpy.float64'>'
    with 2 stored elements in Compressed Sparse Row format>

Look at the data representation for this csr matrix, 3 1d arrays:

In [267]: A.data
Out[267]: array([  0.,  10.,  17.,  16.,  14.,  19.,   6.,   1.,  14.,   9.])  
In [268]: A.indices
Out[268]: array([3, 4, 0, 1, 2, 3, 4, 2, 0, 2], dtype=int32)
In [269]: A.indptr
Out[269]: array([ 0,  2,  2,  7,  8, 10], dtype=int32)

Here's how the row is selected (but in compiled code):

In [270]: A.indices[A.indptr[4]:A.indptr[5]]
Out[270]: array([0, 2], dtype=int32)
In [271]: A.data[A.indptr[4]:A.indptr[5]]
Out[271]: array([ 14.,   9.])

The 'row' is another sparse matrix, with the same sort of data arrays:

In [272]: A[4,:].indptr
Out[272]: array([0, 2])
In [273]: A[4,:].indices
Out[273]: array([0, 2])
In [274]: timeit A[4,:]

Yes, timing for the sparse matrix is slow. I don't know how much of the time is spent in actually selecting the data, and how much is spent constructing the new matrix.

10000 loops, best of 3: 145 µs per loop
In [275]: timeit Ad[4,:].copy()
100000 loops, best of 3: 4.56 µs per loop

lil format may easier to understand, since the data and indices are stored in sublists, one per row.

In [276]: Al=A.tolil()
In [277]: Al.data
Out[277]: array([[0.0, 10.0], [], [17.0, 16.0, 14.0, 19.0, 6.0], [1.0], [14.0, 9.0]], dtype=object)
In [278]: Al.rows
Out[278]: array([[3, 4], [], [0, 1, 2, 3, 4], [2], [0, 2]], dtype=object)
In [279]: Al[4,:].data
Out[279]: array([[14.0, 9.0]], dtype=object)
In [280]: Al[4,:].rows
Out[280]: array([[0, 2]], dtype=object)

Speed comparisons like this make some sense when dealing with tight compiled code, where movements for bytes from part of memory to another are significant time consumers. With the mix of Python and compiled code in numpy and scipy you can't just count O(n) operations.

=============================

Here's an estimate of time it takes to selected a row from A, and time it takes to return a new sparse matrix:

Just get the data:

In [292]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
   .....: 
100000 loops, best of 3: 4.92 µs per loop

plus the time it takes to make a matrix:

In [293]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.csr_matrix((d1,([0,0],i1)),shape=(1,5))
   .....: 
1000 loops, best of 3: 445 µs per loop

Try a simpler coo matrix

In [294]: %%timeit
d1=A.data[A.indptr[4]:A.indptr[5]]
i1=A.indices[A.indptr[4]:A.indptr[5]]
sparse.coo_matrix((d1,([0,0],i1)),shape=(1,5))
   .....: 
10000 loops, best of 3: 135 µs per loop


来源:https://stackoverflow.com/questions/34010334/why-is-row-indexing-of-scipy-csr-matrices-slower-compared-to-numpy-arrays

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!