问题
A short description of my problem is as follows:
I developed a function that calls a CUDA kernel. My function receives a pointer to the host data buffers (input and output of kernel), and has no control over the allocation of these buffers.
--> It is possible that the host data was allocated with either of malloc or cudaHostAlloc. My function is not specifically told which allocation method was used.
The question is: what is a feasible way for my function to figure out whether the host buffers are pinned/page-locked (cudaHostAlloc) or not (regular malloc)?
The reason I am asking is that if they are not page-locked, I would like to use cudaHostRegister() to make them (the buffers) so, to make them amenable for streams.
I have tried three ways which have failed: 1- Always apply cudaHostRegister(): this way is not good if the host buffers are already pinned 2- Run cudaPointerGetAttributes(), and if the return error is cudaSuccess, then the buffers are already pinned, nothing to do; else if cudaErrorInvalidValue, apply cudaHostRegister : for some reason this way results in the kernel execution returning an error 3- Run cudaHostGetFlags(), and if return is not a success, then apply cudaHostRegister : same behavior as 2-.
In the case of 2- and 3-, the error is "invalid argumentn"
Note that my code currently is not using streams, rather always calls cudaMemcpy() for the entire host buffers. If I do not use any of the three above ways, my code runs to completion, regardless of whether the host buffer is pinned or not.
Any advice? Many thanks in advance.
回答1:
Your method 2 should work (I think method 3 should work also). It's likely that you are getting confused by how to do proper CUDA error checking in this scenario.
Since you have a runtime API call that is failing, if you do something like cudaGetLastError
after the kernel call, it will show the runtime API failure that occurred previously on the cudaPointerGetAttributes()
call. This is not necessarily catastrophic, in your case. What you want to do is to clear out that error, since you know it occurred and have handled it correctly. You can do that with an extra call to cudaGetLastError
(for this type of "non-sticky" API error, i.e. an API error that does not imply a corrupted CUDA context).
Here's a fully worked example:
$ cat t642.cu
#include <stdio.h>
#include <stdlib.h>
#define DSIZE 10
#define nTPB 256
#define cudaCheckErrors(msg) \
do { \
cudaError_t __err = cudaGetLastError(); \
if (__err != cudaSuccess) { \
fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
msg, cudaGetErrorString(__err), \
__FILE__, __LINE__); \
fprintf(stderr, "*** FAILED - ABORTING\n"); \
exit(1); \
} \
} while (0)
__global__ void mykernel(int *data, int n){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
if (idx < n) data[idx] = idx;
}
int my_func(int *data, int n){
cudaPointerAttributes my_attr;
if (cudaPointerGetAttributes(&my_attr, data) == cudaErrorInvalidValue) {
cudaGetLastError(); // clear out the previous API error
cudaHostRegister(data, n*sizeof(int), cudaHostRegisterPortable);
cudaCheckErrors("cudaHostRegister fail");
}
int *d_data;
cudaMalloc(&d_data, n*sizeof(int));
cudaCheckErrors("cudaMalloc fail");
cudaMemset(d_data, 0, n*sizeof(int));
cudaCheckErrors("cudaMemset fail");
mykernel<<<(n+nTPB-1)/nTPB, nTPB>>>(d_data, n);
cudaDeviceSynchronize();
cudaCheckErrors("kernel fail");
cudaMemcpy(data, d_data, n*sizeof(int), cudaMemcpyDeviceToHost);
cudaCheckErrors("cudaMemcpy fail");
int result = 1;
for (int i = 0; i < n; i++) if (data[i] != i) result = 0;
return result;
}
int main(int argc, char *argv[]){
int *h_data;
int mysize = DSIZE*sizeof(int);
int use_pinned = 0;
if (argc > 1) if (atoi(argv[1]) == 1) use_pinned = 1;
if (!use_pinned) h_data = (int *)malloc(mysize);
else {
cudaHostAlloc(&h_data, mysize, cudaHostAllocDefault);
cudaCheckErrors("cudaHostAlloc fail");}
if (!my_func(h_data, DSIZE)) {printf("fail!\n"); return 1;}
printf("success!\n");
return 0;
}
$ nvcc -o t642 t642.cu
$ ./t642
success!
$ ./t642 1
success!
$
In your case, I believe you have not properly handled the API error as I have done on the line where I placed the comment:
// clear out the previous API error
If you omit this step (you can try commenting it out), then when you run the code in the case 0 (i.e. don't use pinned memory prior to the function call), then you will appear to get a "spurious" error on the next error check step (the next API call in my case, but may be after the kernel call in your case).
来源:https://stackoverflow.com/questions/28861161/cuda-find-out-if-host-buffer-is-pinned-page-locked