I am confused as to the behavior of the CUDA code I have written. I am in the midst of writing tests for my __device__
functions in a class called DimmedG
So, why does this line (the last cudaMemcpy) throw an error "CudaErrorInvalidValue"...?
The problem revolves around your destructor:
~DimmedGridGPU(){
The destructor is getting called in places you probably aren't expecting. To convince yourself of this, add a printf
statement to the destructor. Note where it appears in the printout:
$ ./t955
g.grid_[0] is now 0.000000
g.grid_[1] is now 1.000000
g.grid_[2] is now 2.000000
g.grid_[3] is now 3.000000
g.grid_[4] is now 4.000000
g.grid_[5] is now 5.000000
g.grid_[6] is now 6.000000
g.grid_[7] is now 7.000000
g.grid_[8] is now 8.000000
g.grid_[9] is now 9.000000
g.grid_[10] is now 10.000000
Destructor!
get_index was called on the GPU in 1 dimension(s)
xi is now 3.500000, min_[i] is 0.000000 and dx_[i] is 1.000000
do_get_value was called on the GPU!, and index[0] is now 3
but multi2one(index) gives us 3
and value to be returned is 3.000000
get_value_kernel has set target[0] to be 3.000000
GPUassert: "cudaErrorInvalidValue": invalid argument t955.cu 167
Given that, it should be pretty evident that calling cudaDeviceReset()
in that destructor now seems like a bad idea. The cudaDeviceReset()
wipes out all device allocations, so then when you attempt to do this:
gpuErrchk(cudaMemcpy(target, d_target, sizeof(double), cudaMemcpyDeviceToHost));
d_target
is no longer a valid allocation on the device, so when you attempt to use it as the device target for cudaMemcpy
, the runtime checks this pointer value (which is not changed by the device reset) and determines that the pointer value no longer corresponds to a valid allocation, and throws a runtime error.
Just like in C++ when you pass an object to a function (or a kernel in this case) as a pass-by-value parameter, the copy constructor for that object gets called. It stands to reason when that object copy goes out of scope, the destructor for it will be called.
I would suggest that putting such global-scope affecting functions as cudaDeviceReset()
in an object destructor might be a fragile programming paradigm, but that is perhaps a matter of opinion. I assume you now have enough information to go about fixing the issue.
To avoid the next possible question, simply commenting out that call to cudaDeviceReset()
in your destructor may not be sufficient to make all problems disappear (although this particular one will). Now that you know that this destructor is being called at least twice in the ordinary execution of this program, you may want to think carefully about what else is going on in that destructor, and perhaps strip more things out of it, or else rearchitect your class altogether.
For example, note that cudaDeviceReset()
is not the only function that can cause trouble in a destructor for objects used this way. Similarly, cudaFree()
may have unintended consequences on the original object, when used in a destructor called on the object-copy.