numpy: efficient, large dot products

后端未结

关注

 2  765

抹茶落季 2020-12-03 19:36

I am trying to perform a large linear-algebra computation to transform a generic covariance matrix KK_l_obs (shape (NL, NL))into a map of covarianc

2条回答

遥遥无期 (楼主)

2020-12-03 20:07

On a relatively modest machine (4G memory) a matmul calc on the whole 10x10x1000x1000 space works.

def looping2(n=2):
    ktemp = np.empty((n,n,nl,nl))
    for i,j in np.ndindex(ktemp.shape[:2]):
        I0_ = I0[i, j]
        temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
        temp = temp / a_map[i,j] + k_l_th
        ktemp[i,j,...] = temp
    K_PC = E @ ktemp @ E.T      
    return K_PC

K = loop()
k4 = looping2(n=X)
np.allclose(k4, K.transpose(2,3,0,1))  # true

I haven't tried to vectorize the IO_ mapping. My focus is on generalizing the double dot product.

The equivalent einsum is:

K_PC = np.einsum('ij,...jk,lk->il...', E, ktemp, E)

That produces a ValueError: iterator is too large error for n=7.

But with the latest version

K_PC = np.einsum('ij,...jk,lk->il...', E, ktemp, E, optimize='optimal')

does work for the full 7x7x10x10 output.

Timings aren't promising. 2.2sec for the original looping, 3.9s for the big matmul (or einsum). (I get the same 2x speedup with original_mod_app)

============

time for constructing a (10,10,1000,1000) array (iteratively):

In [31]: %%timeit 
    ...:     ktemp = np.empty((n,n,nl,nl))
    ...:     for i,j in np.ndindex(ktemp.shape[:2]):
    ...:         I0_ = I0[i, j]
    ...:         temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
    ...:         ktemp[i,j,...] = temp
    ...:     
1 loop, best of 3: 749 ms per loop

time for reducing that to (10,10,7,7) with @ (longer than the construction)

In [32]: timeit E @ ktemp @ E.T
1 loop, best of 3: 1.17 s per loop

time for the same two operations, but with the reduction in the loop

In [33]: %%timeit 
    ...:     ktemp = np.empty((n,n,q,q))
    ...:     for i,j in np.ndindex(ktemp.shape[:2]):
    ...:         I0_ = I0[i, j]
    ...:         temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
    ...:         ktemp[i,j,...] = E @ temp @ E.T

1 loop, best of 3: 858 ms per loop

Performing the dot product within the loop reduces the size of the subarrays that are saved to ktemp, thus making up for the calculation cost. The dot operation on the big array is, by itself, more expensive than your loop. Even if we could 'vectorize' KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl] it wouldn't make up for the cost handling that big array.

0 讨论(0)

查看其它2个回答