Adding two `csc` sparse matrices of different shapes in python

夙愿已清 提交于 2019-12-04 16:41:17

When defining a matrix via coo, or the coo style of input, (data,(row,col)), duplicate entries are summed. Creators of stiffness matrices (for pde solutions) often take advantage of this.

This is a function that uses that. I convert the matrices to coo format (if needed), concatenate their attributes, and build a new matrix.

def with_coo(x,y):
    x=x.tocoo()
    y=y.tocoo()
    d = np.concatenate((x.data, y.data))
    r = np.concatenate((x.row, y.row))
    c = np.concatenate((x.col, y.col))
    C = sparse.coo_matrix((d,(r,c)))
    return C

With @Vadim's examples:

In [59]: C_csc=current_flows.tocsc()
In [60]: R_csc=result_flows.tocsc()

In [61]: with_coo(C_csc, R_csc).tocsc().A
Out[61]: 
array([[ 0,  0,  1],
       [-1,  0,  4],
       [ 0, -2,  0],
       [ 3,  0,  0]], dtype=int32)

When making timings we have to be careful because format conversion are nontrivial, e.g.

In [70]: timeit  C_csc.tocoo()
10000 loops, best of 3: 128 µs per loop

In [71]: timeit  C_csc.todok()
1000 loops, best of 3: 258 µs per loop

Vadim's two options

def with_dok(x, y):
    for k in y.keys():  # no has_key in py3
        if k in x:
           x[k] += y[k]
        else:
           x[k] = y[k]
    return x

def with_update(x,y):
    x.update((k, v+x.get(k)) for k, v in y.items())
    return x

Starting with the csc format:

In [74]: timeit with_coo(C_csc,R_csc).tocsc()
1000 loops, best of 3: 629 µs per loop

In [76]: timeit with_update(C_csc.todok(),R_csc.todok()).tocsc()
1000 loops, best of 3: 1 ms per loop

In [77]: timeit with_dok(C_csc.todok(),R_csc.todok()).tocsc()
1000 loops, best of 3: 1.12 ms per loop

I'm guessing that my coo approach will scale better - but that's just a guess at this point.

Taking conversions out of the picture, the dok update looks better. y has only 2 items, and it does not make any copies - it changes x directly.

In [78]: %%timeit x=C_csc.todok(); y=R_csc.todok()
   ....: with_update(x, y)
   ....: 
10000 loops, best of 3: 33.6 µs per loop

In [79]: %%timeit x=C_csc.tocoo(); y=R_csc.tocoo()
with_coo(x, y)
   ....: 
10000 loops, best of 3: 138 µs per loop

================

The __add__ method for dok_matrix contains (if other is also dok). There's a comment wondering whether they need to check shape.

       new = dok_matrix(self.shape, dtype=res_dtype)
        new.update(self)
        for key in other.keys():
            new[key] += other[key]

[I can get around the shape check in x+y if I first change the shape of y, e.g. y._shape = x.shape. This is kludgy and only works within rational limits of the original shapes. And might not be faster than the with_update approach. dok is more amenable to this sort of shape change than csr or csc.]

If the other is not dok, it does self.tocsc()+other.

For matching shapes, the summation times are

In [91]: timeit current_flows+current_flows
1000 loops, best of 3: 413 µs per loop

In [92]: timeit C_csc+C_csc
1000 loops, best of 3: 223 µs per loop

You should convert matrices to dok class. Then indices and data will be stored as dictionary. Notice, that second matrix result_flows shouldn't have values with indices greater than current_flows shape. (Edited, thanks to @hpaulj comment).

from scipy import sparse

current_flows = sparse.dok_matrix([[0, 0, 1],
                                   [2, 0, 4],
                                   [0, 0, 0],
                                   [3, 0, 0]]
                                   )

result_flows = sparse.dok_matrix([[0, 0, 0, 0, 0],
                                  [-3, 0, 0, 0, 0],
                                  [0, -2, 0, 0, 0]]
                                  )

current_flows.update((k, v + current_flows.get(k)) for k, v in result_flows.items())

current_flows.todense()

Out[108]: matrix([[ 0,  0,  1],
                  [-1,  0,  4],
                  [ 0, -2,  0],
                  [ 3,  0,  0]])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!