Share SciPy Sparse Array Between Process Objects

匿名 (未验证) 提交于 2019-12-03 08:54:24

问题:

I've recently been learning Python multiprocessing, and have run into a roadblock. I have a lerge sparse SciPy array (CSC-format), that I need to share in read only format between 5 worker-processes. I've read this and this (numpy-shared), but this seems to be only for dense-types.

How would I share a scipy.sparse.csc_matrix() without copying (or with minimal copying) between 5 multiprocessing Process objects? Even the numpy-shared method seems to require copying the entire array, and even then, I can't just convert a scipy.sparse into a mp.Array(). Could anyone help point me in the right direction?

Thanks!

回答1:

I cannot help you with the multiprocessing part of your question, but a CSC sparse matrix is little more than three numpy arrays. You can instantiate another sparse matrix, b, sharing the same memory objects as a sparse matrix, a, by doing:

import scipy.sparse as sps  b = sps.csc_matrix((a.data, a.indices, a.indptr), shape=a.shape, copy=False)

a.data, a.indices and a.indptr are the three numpy arrays you want to share between your processes, if you can do that, then instantiating a sparse matrix in each process will be an inexpensive operation.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!