Why Opencv GPU code is slower than CPU?

后端未结

关注

 5  1026

I\'m using opencv242 + VS2010 by a notebook.
I tried to do some simple test of the GPU block in OpenCV, but it showed the GPU is 100 times slower than CPU codes. In this

相关标签:

5条回答

无人及你

2020-12-13 16:24

cvtColor isn't doing very much work, to make grey all you have to is average three numbers.

The cvColor code on the CPU is using SSE2 instructions to process upto 8 pixels at once and if you have TBB it's using all the cores/hyperthreads, the CPU is running at 10x the clock speed of the GPU and finally you don't have to copy data onto the GPU and back.

0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-13 16:28

cvtColour is a small operation, and any performance boost you get from doing it on the GPU is vastly outweighed by memory transfer times between host (CPU) and device (GPU). Minimizing the latency of this memory transfer is a primary challenge of any GPU computing.

0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-12-13 16:39

try to run more than once....

-----------excerpt from http://opencv.willowgarage.com/wiki/OpenCV%20GPU%20FAQ Perfomance

Why first function call is slow?

That is because of initialization overheads. On first GPU function call Cuda Runtime API is initialized implicitly. Also some GPU code is compiled (Just In Time compilation) for your video card on the first usage. So for performance measure, it is necessary to do dummy function call and only then perform time tests.

If it is critical for an application to run GPU code only once, it is possible to use a compilation cache which is persistent over multiple runs. Please read nvcc documentation for details (CUDA_DEVCODE_CACHE environment variable).

0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-12-13 16:44

Most answers above are actually wrong. The reason why it is slow by a factor 20.000 is of course not because of 'CPU clockspeed is faster' and 'it has to copy it to the GPU' (accepted answers). These are factors, but by saying that you omit the fact that you have vastly more computing power for a problem that is disgustingly parallel. Saying 20.000x performance difference is because of the latter is just so plain ridiculous. The author here knew something was wrong that's not straight forward. Solution:

Your problem is that CUDA needs to initialize! It will always initialize for the first image and generally takes between 1-10 seconds, depending on the alignment of Jupiter and Mars. Now try this. Do the computation twice and then time them both. You will probably see in this case that the speeds are within the same order of magnutide, not 20.000x, that's ridiculous. Can you do something about this initialization? Nope, not that I know of. It's a snag.

edit: I just re-read the post. You say you're running on a notebook. Those often have shabby GPU's, and CPU's with a fair turbo.

0 讨论(0)
发布评论:

提交评论
- 加载中...
误落风尘

2020-12-13 16:50

What GPU do you have?

Check compute compability, maybe it's the reason.

https://developer.nvidia.com/cuda-gpus

This means that for devices with CC 1.3 and 2.0 binary images are ready to run. For all newer platforms, the PTX code for 1.3 is JIT’ed to a binary image. For devices with CC 1.1 and 1.2, the PTX for 1.1 is JIT’ed. For devices with CC 1.0, no code is available and the functions throw Exception. For platforms where JIT compilation is performed first, the run is slow.

http://docs.opencv.org/modules/gpu/doc/introduction.html

0 讨论(0)
发布评论:

提交评论
- 加载中...