Is MATLAB's bsxfun the best? Python's numpy.einsum?

久未见 提交于 2019-12-23 04:20:52

问题


I have a very large multiply and sum operation that I need to implement as efficiently as possible. The best method I've found so far is bsxfun in MATLAB, where I formulate the problem as:

L = 10000;
x = rand(4,1,L+1);
A_k = rand(4,4,L);
tic
for k = 2:L
    i = 2:k;
    x(:,1,k+1) = x(:,1,k+1)+sum(sum(bsxfun(@times,A_k(:,:,2:k),x(:,1,k+1-i)),2),3);
end
toc

Note that L will be larger in practice. Is there a faster method? It's strange that I need to first add the singleton dimension to x and then sum over it, but I can't get it to work otherwise.

It's still much faster than any other method I've tried, but not enough for our application. I've heard rumors that the Python function numpy.einsum may be more efficient, but I wanted to ask here first before I consider porting my code.

I'm using MATLAB R2017b.


回答1:


I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k array:

B_k = sum(A_k,2);
for k = 2:L
    i = 2:k;
    x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(@times,B_k(:,1,2:k),x(:,1,k+1-i)),3);
end

With this single change the runtime is reduced from ~8 seconds to ~2.5 seconds on my laptop.

The second summation could also be removed, by transforming times+sum into a matrix-vector product. It needs some singleton fiddling to get the dimensions right, but if you define an auxiliary array that is B_k with the second dimension reversed, you can generate the remaining sum as ~x*C_k with this auxiliary array C_k, give or take a few calls to reshape.


So after a closer look I realized that my original assessment was overly optimistic: you have multiplications in both dimensions in your remaining term, so it's not a simple matrix product. Anyway, we can rewrite that term to be the diagonal of a matrix product. This implies that we're computing a bunch of unnecessary matrix elements, but this still seems to be slightly faster than the bsxfun approach, and we can get rid of your pesky singleton dimension too:

L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';

tic
for k = 2:L
    ii = 1:k-1;
    x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1-ii,:));
end
toc

This runs in ~2.2 seconds on my laptop, somewhat faster than the ~2.5 seconds obtained previously.




回答2:


Since you're using an new version of Matlab you might try broadcasting / implicit expansion instead of bsxfun:

x(:,1,k+1) = x(:,1,k+1)+sum(sum(A_k(:,:,2:k).*x(:,1,k-1:-1:1),3),2);

I also changed the order of summation and removed the i variable for further improvement. On my machine, and with Matlab R2017b, this was about 25% faster for L = 10000.



来源:https://stackoverflow.com/questions/46836480/is-matlabs-bsxfun-the-best-pythons-numpy-einsum

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!