How to read/traverse/slice Scipy sparse matrices (LIL, CSR, COO, DOK) faster?

后端未结

关注

 2  990

悲哀的现实 2021-01-06 18:44

To manipulate Scipy matrices, typically, the built-in methods are used. But sometimes you need to read the matrix data to assign it to non-sparse data types. For the sake of

2条回答

灰色年华 (楼主)

2021-01-06 18:50
Try reading the raw data. Scipy sparse matrices are stored in Numpy ndarrays each with different format.

Reading the raw data of LIL sparse matrix
```
%%timeit -n3
for i, (row, data) in enumerate(zip(lil.rows, lil.data)):
    for j, val in zip(row, data):
        arr[i,j] = val
```
3 loops, best of 3: 4.61 ms per loop

Reading the raw data of CSR sparse matrix

For csr matrix it is a bit less pythonic to read from raw data but it is worth the speed.
```
csr = lil.tocsr()

%%timeit -n3
start = 0
for i, end in enumerate(csr.indptr[1:]):
    for j, val in zip(csr.indices[start:end], csr.data[start:end]):
        arr[i,j] = val
    start = end
```
3 loops, best of 3: 8.14 ms per loop

Similar approach is used in this DBSCAN implementation.

Reading the raw data of COO sparse matrix
```
%%timeit -n3
for i,j,d in zip(coo.row, coo.col, coo.data):
    arr[i,j] = d
```
3 loops, best of 3: 5.97 ms per loop

Based on these limited tests:
- COO matrix: cleanest
- LIL matrix: fastest
- CSR matrix: slowest and ugliest. The only good side is that conversion to/from CSR is extremely fast.
Edit: from @hpaulj I added COO matrix to have all the methods in one place.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

How to read/traverse/slice Scipy sparse matrices (LIL, CSR, COO, DOK) faster?

Reading the raw data of LIL sparse matrix

Reading the raw data of CSR sparse matrix

Reading the raw data of COO sparse matrix