Is it possible to speed up this MATLAB script?

匿名 (未验证) 提交于 2019-12-03 01:21:01

问题:

I've encountered some performance problems thus I want to speed up those running-slow scripts. But I have no more ideas on how to speed up them. Because I found I was often blocked with the indices. I found the abstract thinking is very difficult for me.

The script is

    tic,     n = 1000;     d = 500;     X = rand(n, d);     R = rand(n, n);     F = zeros(d, d);     for i=1:n         for j=1:n            F = F + R(i,j)* ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));         end     end     toc

回答1:

Discussion & Solution Codes

Few approaches with bsxfun could be suggested here. Also, read on to see how one can get 30x+ speedup on a problem like this!

Approach #1 (Naive vectorized approach)

To accommodate the two operations of subtractions between rows of X and then the subsequent element-wise multiplications between them, a naive bsxfun based approach would lead to a 4D intermediate array which would correspond to ((X(i,:)-X(j,:))' * (X(i,:)-X(j,:))). After that, one needs to multiply R to have the final output F. This is implemented as shown next -

v1 = bsxfun(@minus,X,permute(X,[3 2 1])); v2 = bsxfun(@times,permute(v1,[1 3 2]),permute(v1,[1 3 4 2])); F = reshape(R(:).'*reshape(v2,[],d^2),d,[]);

Approach #2 (Not-so-naive vectorized approach)

The earlier mentioned approach goes into 4D which could slow down things. So, instead you can keep the intermediate data until 3D by reshaping. This is listed next -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1_2d = reshape(permute(sub1,[1 3 2]),n^2,[]) mult1 = bsxfun(@times,sub1_2d,permute(sub1_2d,[1 3 2])) F = reshape(R(:).'*reshape(mult1,[],d^2),d,[])

Approach #3 (Hybrid approach)

Now, you can make a hybrid approach based on Approach #2 (vectorized subtractions + loopy multiplications). Benefit of this approach would be that it uses the fast matrix multiplication to perform the multiplications and reduces the complexity to O(n) from the earlier O(n^2) and this should make it much more efficient. Thanks to @Dev-iL, for suggesting this idea! Here's the code -

sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));  F = zeros(d); for k = 1:size(sub1,3)     blk = sub1(:,:,k);         F = F + blk.'*blk; end

Benchmarking

Benchmarking code comparing the original approach against Approach #3

%// Parameters n = 500; d = 250; X = rand(n, d); R = rand(n, n);  %// Warm up tic/toc. for k = 1:100000     tic(); elapsed = toc(); end  disp('------------------------------ With Original Approach') tic F1 = zeros(d, d); for i=1:n     for j=1:n         F1 = F1 + R(i,j)*((X(i,:)-X(j,:))' * (X(i,:)-X(j,:)));     end end toc, clear F1 i j  disp('------------------------------ With Proposed Approach #3') tic sub1 = bsxfun(@minus,X,permute(X,[3 2 1])); sub1 = bsxfun(@times,sub1,permute(sqrt(R),[1 3 2]));  F = zeros(d); for k = 1:size(sub1,3)     blk = sub1(:,:,k);         F = F + blk.'*blk; end toc

Runtime results

------------------------------ With Original Approach Elapsed time is 29.728571 seconds. ------------------------------ With Proposed Approach #3 Elapsed time is 0.839726 seconds.

So, who's ready for a 30x+ speedup!?



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!