“invalid configuration argument ” error for the call of CUDA kernel?

后端 未结 2 1739
悲哀的现实
悲哀的现实 2020-12-01 15:59

Here is my code:

int threadNum = BLOCKDIM/8;
dim3 dimBlock(threadNum,threadNum);
int blocks1 = nWidth/threadNum + (nWidth%threadNum == 0 ? 0 : 1);
int blocks         


        
相关标签:
2条回答
  • 2020-12-01 16:20

    Just to add to the previous answers, you can find the max threads allowed in your code also, so it can run in other devices without hard-coding the number of threads you will use:

    struct cudaDeviceProp properties;
    cudaGetDeviceProperties(&properties, device);
    cout<<"using "<<properties.multiProcessorCount<<" multiprocessors"<<endl;
    cout<<"max threads per processor: "<<properties.maxThreadsPerMultiProcessor<<endl;
    
    0 讨论(0)
  • 2020-12-01 16:28

    This type of error message frequently refers to the launch configuration parameters (grid/threadblock dimensions in this case, could also be shared memory, etc. in other cases). When you see a message like this it's a good idea just to print out your actual config parameters before launching the kernel, to see if you've made any mistakes.

    You said BLOCKDIM = 512. You have threadNum = BLOCKDIM/8 so threadNum = 64. Your threadblock configuration is:

    dim3 dimBlock(threadNum,threadNum);
    

    So you are asking to launch blocks of 64 x 64 threads, that is 4096 threads per block. That won't work on any generation of CUDA devices. All current CUDA devices are limited to a maximum of 1024 threads per block, which is the product of the 3 block dimensions.

    Maximum dimensions are listed in table 14 of the CUDA programming guide, and also available via the deviceQuery CUDA sample code.

    0 讨论(0)
提交回复
热议问题