cudaMemcpy error when copying from device to host after __device__ class member function alters value of device variable

后端 未结 1 1593
佛祖请我去吃肉
佛祖请我去吃肉 2020-12-12 03:03

I am confused as to the behavior of the CUDA code I have written. I am in the midst of writing tests for my __device__ functions in a class called DimmedG

相关标签:
1条回答
  • 2020-12-12 03:35

    So, why does this line (the last cudaMemcpy) throw an error "CudaErrorInvalidValue"...?

    The problem revolves around your destructor:

      ~DimmedGridGPU(){
    

    The destructor is getting called in places you probably aren't expecting. To convince yourself of this, add a printf statement to the destructor. Note where it appears in the printout:

    $ ./t955
    g.grid_[0] is now 0.000000
    g.grid_[1] is now 1.000000
    g.grid_[2] is now 2.000000
    g.grid_[3] is now 3.000000
    g.grid_[4] is now 4.000000
    g.grid_[5] is now 5.000000
    g.grid_[6] is now 6.000000
    g.grid_[7] is now 7.000000
    g.grid_[8] is now 8.000000
    g.grid_[9] is now 9.000000
    g.grid_[10] is now 10.000000
    Destructor!
    get_index was called on the GPU in 1 dimension(s)
    xi is now 3.500000, min_[i] is 0.000000 and dx_[i] is 1.000000
    do_get_value was called on the GPU!, and index[0] is now 3
    but multi2one(index) gives us 3
    and value to be returned is 3.000000
    get_value_kernel has set target[0] to be 3.000000
    GPUassert: "cudaErrorInvalidValue": invalid argument t955.cu 167
    

    Given that, it should be pretty evident that calling cudaDeviceReset() in that destructor now seems like a bad idea. The cudaDeviceReset() wipes out all device allocations, so then when you attempt to do this:

    gpuErrchk(cudaMemcpy(target, d_target, sizeof(double), cudaMemcpyDeviceToHost));
    

    d_target is no longer a valid allocation on the device, so when you attempt to use it as the device target for cudaMemcpy, the runtime checks this pointer value (which is not changed by the device reset) and determines that the pointer value no longer corresponds to a valid allocation, and throws a runtime error.

    Just like in C++ when you pass an object to a function (or a kernel in this case) as a pass-by-value parameter, the copy constructor for that object gets called. It stands to reason when that object copy goes out of scope, the destructor for it will be called.

    I would suggest that putting such global-scope affecting functions as cudaDeviceReset() in an object destructor might be a fragile programming paradigm, but that is perhaps a matter of opinion. I assume you now have enough information to go about fixing the issue.

    To avoid the next possible question, simply commenting out that call to cudaDeviceReset() in your destructor may not be sufficient to make all problems disappear (although this particular one will). Now that you know that this destructor is being called at least twice in the ordinary execution of this program, you may want to think carefully about what else is going on in that destructor, and perhaps strip more things out of it, or else rearchitect your class altogether.

    For example, note that cudaDeviceReset() is not the only function that can cause trouble in a destructor for objects used this way. Similarly, cudaFree() may have unintended consequences on the original object, when used in a destructor called on the object-copy.

    0 讨论(0)
提交回复
热议问题