CUDA-parallelized raytracer: very low speedup

与世无争的帅哥 提交于 2019-12-08 17:03:32
talonmies
  1. Is it normal to get a speedup of 3x or 4x in a GPU-parallelized raytracer against a sequential code?

How long is a piece of string? There is no answer to this question.

  1. Do you see anything wrong in the CUDA setup or in the code that could be causing this behaviour?

Yes, as noted in comments, you are using a completely inappropriate block size which is wasting approximately 85% of the potential computational capacity of your GPU.

  1. Am I missing something important?

Yes, the answer to this question. Setting correct execution parameters is about 50% of the practical performance tuning requirements in CUDA, and you should be able to obtain noticeable performance improvements just be selecting a sane block size. Beyond this, careful profiling should be your next port of call.

[This answer assembled from comments and added as community wiki entry to get this (very broad) question off the unanswered list in the absence of enough close votes to close it].

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!