I have a sum of sums that I want to speed up. In one case it is:
S_{x,y,k,l} Fu_{ku} Fv_{lv} Fx_{kx} Fy_{ly}
In the other case it is:
S_{x,y} ( S_{k,l} F
I'll start a new answer since the problem has changed.
Try this:
E = np.einsum('uk, vl, xk, yl, xy, kl->uvxy', Fu, Fv, Fx, Fy, P, B)
E1 = np.einsum('uvxy->uv', E)
E2 = np.einsum('uvxy->uv', np.square(E))
I've found it runs just as fast as the time for I1_.
Here is my test code: http://pastebin.com/ufwy7cLy