I am trying to perform a large linear-algebra computation to transform a generic covariance matrix KK_l_obs
(shape (NL, NL)
)into a map of covarianc
On a relatively modest machine (4G memory) a matmul calc on the whole 10x10x1000x1000 space works.
def looping2(n=2):
ktemp = np.empty((n,n,nl,nl))
for i,j in np.ndindex(ktemp.shape[:2]):
I0_ = I0[i, j]
temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
temp = temp / a_map[i,j] + k_l_th
ktemp[i,j,...] = temp
K_PC = E @ ktemp @ E.T
return K_PC
K = loop()
k4 = looping2(n=X)
np.allclose(k4, K.transpose(2,3,0,1)) # true
I haven't tried to vectorize the IO_
mapping. My focus is on generalizing the double dot product.
The equivalent einsum
is:
K_PC = np.einsum('ij,...jk,lk->il...', E, ktemp, E)
That produces a ValueError: iterator is too large
error for n=7.
But with the latest version
K_PC = np.einsum('ij,...jk,lk->il...', E, ktemp, E, optimize='optimal')
does work for the full 7x7x10x10 output.
Timings aren't promising. 2.2sec for the original looping
, 3.9s for the big matmul (or einsum). (I get the same 2x speedup with original_mod_app
)
============
time for constructing a (10,10,1000,1000) array (iteratively):
In [31]: %%timeit
...: ktemp = np.empty((n,n,nl,nl))
...: for i,j in np.ndindex(ktemp.shape[:2]):
...: I0_ = I0[i, j]
...: temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
...: ktemp[i,j,...] = temp
...:
1 loop, best of 3: 749 ms per loop
time for reducing that to (10,10,7,7) with @ (longer than the construction)
In [32]: timeit E @ ktemp @ E.T
1 loop, best of 3: 1.17 s per loop
time for the same two operations, but with the reduction in the loop
In [33]: %%timeit
...: ktemp = np.empty((n,n,q,q))
...: for i,j in np.ndindex(ktemp.shape[:2]):
...: I0_ = I0[i, j]
...: temp = KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
...: ktemp[i,j,...] = E @ temp @ E.T
1 loop, best of 3: 858 ms per loop
Performing the dot product within the loop reduces the size of the subarrays that are saved to ktemp
, thus making up for the calculation cost. The dot operation on the big array is, by itself, more expensive than your loop. Even if we could 'vectorize' KK_l_obs[I0_ : I0_ + nl, I0_ : I0_ + nl]
it wouldn't make up for the cost handling that big array.