问题
I tried the matlab's convolution function conv2 convn with gpuArray. For example convn(gpuArray.rand(100,100,10,'single'),gpuArray.rand(5,'single') and compared it to the cpu version convn(rand(100,100,10),rand(5)). Unfortunately the gpu version is much slower than the cpu version, especially noticeable when I put the function for example into a loop(which will be relevant for me). Does anyone know an alternative to fast convolution using matlab and the gpu for relatively small filtering kernels from 5x5 to 14x14?
回答1:
The GPU performance is limited by the data array size [100x100x10] and [5x5] in your test case.
The actual performance also depends on the GPU and CPU module type. For your data size (test case 2 of the following code), I can get a performance improvement (2.75x) on GPU Tesla M2090 and CPU Xeon E5-2609.
For the following matlab test code
m=1000;
n=100;
k=5;
gc=convn(gpuArray.rand(m,m,10,'single'),gpuArray.rand(k,'single'));
tic;
for i=1:n
gc=convn(gpuArray.rand(m,m,10,'single'),gpuArray.rand(k,'single'));
end
toc
c=convn(rand(m,m,10,'single'),rand(k,'single'));
tic;
for i=1:n
c=convn(rand(m,m,10,'single'),rand(k,'single'));
end
toc
When m=1000; n=100; k=5; I got very good performance improvement (11.6x) on GPU.
Elapsed time is 2.367453 seconds.
Elapsed time is 27.502952 seconds.
But when m=100; n=1000; k=5; I got only 2.75x
Elapsed time is 1.206053 seconds.
Elapsed time is 3.330559 seconds.
When m=100; n=1000; k=14;, it becomes better (4.84x).
Elapsed time is 2.804957 seconds.
Elapsed time is 13.585698 seconds.
来源:https://stackoverflow.com/questions/19477224/matlab-convolution-using-gpu