Building and updating a sparse matrix in python using scipy

谁说胖子不能爱 提交于 2019-12-29 20:12:13

问题


I'm trying to build and update a sparse matrix as I read data from file. The matrix is of size 100000X40000

What is the most efficient way of updating multiple entries of the sparse matrix? specifically I need to increment each entry by 1.

Let's say I have row indices [2, 236, 246, 389, 1691]

and column indices [117, 3, 34, 2757, 74, 1635, 52]

so all the following entries must be incremented by one:

(2,117) (2,3) (2,34) (2,2757) ...

(236,117) (236,3) (236, 34) (236,2757) ...

and so on.

I'm already using lil_matrix as it gave me a warning to use while I tried to update a single entry.

lil_matrix format is already not supporting multiple updating. matrix[1:3,0] += [2,3] is giving me a notimplemented error.

I can do this naively, by incrementing every entry individually. I was wondering if there is any better way to do this, or better sparse matrix implementation that I can use.

My computer is also an average i5 machine with 4GB RAM, so I have to be careful not to blow it up :)


回答1:


Creating a second matrix with 1s in your new coordinates and adding it to the existing one is a possible way of doing this:

>>> import scipy.sparse as sps
>>> shape = (1000, 2000)
>>> rows, cols = 1000, 2000
>>> sps_acc = sps.coo_matrix((rows, cols)) # empty matrix
>>> for j in xrange(100): # add 100 sets of 100 1's
...     r = np.random.randint(rows, size=100)
...     c = np.random.randint(cols, size=100)
...     d = np.ones((100,))
...     sps_acc = sps_acc + sps.coo_matrix((d, (r, c)), shape=(rows, cols))
... 
>>> sps_acc
<1000x2000 sparse matrix of type '<type 'numpy.float64'>'
    with 9985 stored elements in Compressed Sparse Row format>



回答2:


import scipy.sparse

rows = [2, 236, 246, 389, 1691]
cols = [117, 3, 34, 2757, 74, 1635, 52]
prod = [(x, y) for x in rows for y in cols] # combinations
r = [x for (x, y) in prod] # x_coordinate
c = [y for (x, y) in prod] # y_coordinate
data = [1] * len(r)
m = scipy.sparse.coo_matrix((data, (r, c)), shape=(100000, 40000))

I think it works well and doesn't need loops. I am directly following the doc

<100000x40000 sparse matrix of type '<type 'numpy.int32'>'
    with 35 stored elements in COOrdinate format>



回答3:


This answer expands the comment of @behzad.nouri. To increment the values at the "outer product" of your lists of rows and columns indices, just create these as numpy arrays configured for broadcasting. In this case, that means put the rows into a column. For example,

In [59]: a = lil_matrix((4,4), dtype=int)

In [60]: a.A
Out[60]: 
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

In [61]: rows = np.array([1,3]).reshape(-1, 1)

In [62]: rows
Out[62]: 
array([[1],
       [3]])

In [63]: cols = np.array([0, 2, 3])

In [64]: a[rows, cols] += np.ones((rows.size, cols.size))

In [65]: a.A
Out[65]: 
array([[0, 0, 0, 0],
       [1, 0, 1, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])

In [66]: rows = np.array([0, 1]).reshape(-1,1)

In [67]: cols = np.array([1, 2])

In [68]: a[rows, cols] += np.ones((rows.size, cols.size))

In [69]: a.A
Out[69]: 
array([[0, 1, 1, 0],
       [1, 1, 2, 1],
       [0, 0, 0, 0],
       [1, 0, 1, 1]])


来源:https://stackoverflow.com/questions/20583381/building-and-updating-a-sparse-matrix-in-python-using-scipy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!