cudaMemcpyToSymbol performance

笑着哭i 提交于 2019-12-02 08:49:18

问题


I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code:

        __constant__ double res1;

        __global__kernel1(...) {...}

        void function1() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res1, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        __constant__ double res2;

        __global__kernel2(...) {...}

        void function2() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res2, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        int main(){
            function1(); //takes 0.6 seconds for loading
            function2(); // takes 0.0008 seconds for loading
            function1(); //takes 0.0008 seconds for loading

            return 0;
        }

Why is this happening? Can I avoid it?


回答1:


Why is this happening?

Lazy runtime API context establishment and setup.

Can I avoid it?

No. The first runtime API call to require a context will incur significant setup latency, in your case that is the first cudaMemcpyToSymbol call.



来源:https://stackoverflow.com/questions/49829275/cudamemcpytosymbol-performance

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!