numpy: efficient, large dot products

后端未结

关注

 2  767

抹茶落季 2020-12-03 19:36

I am trying to perform a large linear-algebra computation to transform a generic covariance matrix KK_l_obs (shape (NL, NL))into a map of covarianc

2条回答

长情又很酷 (楼主)

2020-12-03 20:11
Tweak #1

One very simple performance tweak that's mostly ignored in NumPy is avoiding the use of division and using multiplication. This is not noticeable when dealing with scalar to scalar or array to array divisions when dealing with equal shaped arrays. But NumPy's implicit broadcasting makes it interesting for divisions that allow for broadcasting between arrays of different shapes or between an array and scalar. For those cases, we could get noticeable boost using multiplication with the reciprocal numbers. Thus, for the stated problem, we would pre-compute the reciprocal of a_map and use those for multiplications in place of divisions.

So, at the start do :
```
r_a_map = 1.0/a_map
```
Then, within the nested loops, use it as :
```
KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl] * r_a_map[si[0], si[1]]
```
Tweak #2

We could use associative property of multiplication there :
```
A*(B + C) = A*B + A*C
```
Thus, k_l_th that is summed across all iterations but stays constant could be taken outside of the loop and summed up after getting out of the nested loops. It's effective summation would be : E.dot(k_l_th).dot(E.T). So, we would add this to K_PC.

Finalizing and benchmarking

Using tweak #1 and tweak#2, we would end up with a modified approach, like so -
```
def original_mod_app():
    r_a_map = 1.0/a_map
    K_PC = np.empty((q, q, X, Y))
    inds = np.ndindex((X, Y))
    for si in inds:
        I0_ = I0[si[0], si[1]]
        K_PC[..., si[0], si[1]] = E.dot(
            KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl] * \
            r_a_map[si[0], si[1]]).dot(E.T)
    return K_PC + E.dot(k_l_th).dot(E.T)[:,:,None,None]
```
Runtime test with the same sample setup as used in the question -
```
In [458]: %timeit original_app()
1 loops, best of 3: 1.4 s per loop

In [459]: %timeit original_mod_app()
1 loops, best of 3: 677 ms per loop

In [460]: np.allclose(original_app(), original_mod_app())
Out[460]: True
```
So, we are getting a speedup of 2x+ there.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

numpy: efficient, large dot products

Tweak #1

Tweak #2

Finalizing and benchmarking