Matlab is slow when using user defined function with calculation in GPU

大兔子大兔子 提交于 2020-01-16 00:51:10

问题


When I run the code shown below, the tic/toc pair inside the function shows it takes very short time (<< 1sec) to go through all the lines. However, it actually takes around 2.3secs to get the outputs!!! I use the tic/toc pair to measure the time.

tic

rnn.v = 11;
rnn.h = 101;
rnn.o = 7;
rnn.h_init = randn(1,rnn.h,'gpuArray');
rnn.W_vh = randn(rnn.v,rnn.h,'gpuArray');
rnn.W_hh = randn(rnn.h,rnn.h,'gpuArray');
rnn.W_ho = randn(rnn.h,rnn.o,'gpuArray');

inData.V = randn(10000,11,100,'gpuArray');
inData.TimeSteps =100;
inData.BatchSize = 10000;

[H,OX] = forward_pass(rnn, inData)
toc

All the matrices in rnn, and inData are gpuArray, so all the calculation are carried out in GPU. The outputs are also gpuArray.

function [H,OX] = forward_pass(rnn, inData)
        tic;
        %initial hidden state values
        H_init = gpuArray(repmat(rnn.h_init,[inData.BatchSize,1]));

        %initialize state H
        H = zeros(inData.BatchSize, rnn.h, inData.TimeSteps,'gpuArray');

        %initialize OX (which is H * Who)
        OX = zeros(inData.BatchSize, rnn.o, inData.TimeSteps,'gpuArray');

        for t = 1 : inData.TimeSteps

            if t == 1
                HX_t = H_init * rnn.W_hh... 
                        + inData.V(:,:,t) * rnn.W_vh;
            else
                HX_t = H(:,:,(t-1)) * rnn.W_hh... 
                        + inData.V(:,:,t) * rnn.W_vh;
            end

            H(:,:,t) = tanh(HX_t);
            OX(:,:,t) = H(:,:,t) * rnn.W_ho;


        end

        toc;
    end

Normally, if you use gather() function, it will be slow. I didn't use the gather() function to transfer the outputs to workspace, I don't know why it is still so slow. It looks like the last line "end" takes more than 2secs.

Anyone knows how to accelerate the function call?


回答1:


First off, for proper benchmarking you do need to use gather either inside the function call or afterwards. In the former case, you would have a non-gpu output from the function call and in the latter case, a gpu-based datatype would be the output. Now, back to your problem, you are using very few TimeSteps and as such any optimization that you might try out won't reflect in a huge manner. Here's an optimized version that will show increased performance as you increase Timesteps -

function [H,OX] = forward_pass(rnn, inData)

H = zeros(inData.BatchSize, rnn.h, inData.TimeSteps,'gpuArray');

T = reshape(permute(inData.V,[1 3 2]),[],size(inData.V,2))*rnn.W_vh;
H(:,:,1) = tanh(bsxfun(@plus,rnn.h_init * rnn.W_hh,T(1:size(inData.V,1),:)));

for t = 2 : inData.TimeSteps
    H(:,:,t) = tanh( H(:,:,(t-1))*rnn.W_hh + ...
        T((t-1)*size(inData.V,1)+1: t*size(inData.V,1),:));
end

A = reshape(permute(H,[1 3 2]),[],size(H,2))*rnn.W_ho;
OX = permute(reshape(A,size(H,1),size(A,1)/size(H,1),[]),[1 3 2]);

return;

Benchmarking

Test Case #1

Parameters

rnn.v = 11;
rnn.h = 5;
rnn.o = 7;
inData.TimeSteps = 10000;
inData.BatchSize = 10;

Results

---- Original Code :
Elapsed time is 5.678876 seconds.
---- Modified Code :
Elapsed time is 3.821059 seconds.

Test Case #2

Parameters

inData.TimeSteps = 50000; (rest are same as in Test Case #1)

Results

---- Original Code :
Elapsed time is 28.392290 seconds.
---- Modified Code :
Elapsed time is 19.031776 seconds.

Please note that these are tested on GTX 750 Ti.



来源:https://stackoverflow.com/questions/25468639/matlab-is-slow-when-using-user-defined-function-with-calculation-in-gpu

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!