nvvp and nsight's profiler give a different result?

六月ゝ 毕业季﹏ 提交于 2019-12-13 03:44:53

问题


I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation happen?

The experiment was tried on Linux, CUDA 5.0, GTX 580. The program is only copying data from one array to another in kernel function: In main:

cudaMalloc((void**)&dev_a, NUM * sizeof(float));
cudaMalloc((void**)&dev_b, NUM * sizeof(float));
kernel<<<grid,block>>>((uint4 *)dev_a, (uint4 *)dev_b);

the kernel:

__global__ void kernel(uint4 *a, uint4 *b){
        unsigned int id = blockIdx.x * THREAD_NUM + threadIdx.x;
        for(unsigned int i = 0;i < LOOP/4;i++){
                b[id + i * GRID_NUM * THREAD_NUM] = a[id + i * GRID_NUM * THREAD_NUM];
        }
        return;

回答1:


Profiler in Nsight EE and standalone Visual Profiler on Linux are based on a same codebase. Please make sure:

  1. You are using same executable.
  2. There is no difference in environment variable values (e.g. LD_LIIBRARY_PATH).

Please note that Nsight EE launch UI may be slightly confusing. When you click "Profile" after debugging the debug build, it may actually run profiling on debug executable trying to keep all the custom launch settings (e.g. command line arguments, working folder, etc.) you could have setup. From the main menu click Run->Profile Configurations... to see the settings Nsight uses when profiling your application.



来源:https://stackoverflow.com/questions/14254512/nvvp-and-nsights-profiler-give-a-different-result

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!