SVD speed in CPU and GPU

后端 未结 4 1414
日久生厌
日久生厌 2021-01-02 13:57

I\'m testing svd in Matlab R2014a and it seems that there is no CPU vs GPU speedup. I\'m using a GTX 460 car

4条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-02 14:35

    Generally SVD is a difficult to paralellize routine. You can check here that with a high end Tesla card, the speedup is not very impressive.

    You have a GTX460 card - Fermi architecture. The card is optimized for gaming (single precision computations), not HPC (double precision computation). The Single Precision / Double Precision throughput ratio is 12. So the card has 873 GFLOPS SP / 72 GFLOPS DP. Check here.

    So if the Md array uses double precision elements, then the computation on it would be rather slow. Also there's a high chance that when calling the CPU routine, all CPU cores will get utilized, reducing the possible gain of running the routine on the GPU. Plus, in the GPU run you pay time for transferring the buffer to the device.

    Per Divakar's suggestion, you could use Md = single(Md) to convert your array to single precision and run the benchmark again. You can try and go with a bigger dataset size to see if something changes. I don't expect to much gain for this routine on your GPU.

    Update 1:

    After you posted the results, I saw that the DP/SP time ratio is 2. On the CPU side this is normal, because you can fit 2 times less double values in SSE registers. However, a ratio of only 2 on the GPU side means that the gpu code does not make best use of the SM cores - because the theoretical ratio is 12. In other words, I would have expected much better SP performance for an optimized code, compared to DP. It seems that this is not the case.

提交回复
热议问题