问题
I am trying to parallelize a C function using CUDA. I noticed that there are several structs which are being passed as pointers to this function.
With the unified memory view, I have identified and modified malloc()
to cudaMallocManaged()
.
But, now there is a allocation using memalign()
. I want to achieve a similar task as that was done by cudaMallocManaged()
.
Does such an equivalent exists ? If no, then what needs to be done?
This is how the memalign()
allocation line looks:
float *data = (float*) memalign(16, some_integer*sizeof(float));
回答1:
You should be able to register an existing host memory buffer like this:
float *data = (float*) memalign(16, some_integer*sizeof(float));
cudaHostRegister((void *)data, some_integer*sizeof(float), cudaHostRegisterDefault);
after registration data
should behave the same as memory allocated with cudaMallocManaged
. Check the return value from the cudaHostRegister
call, if it fails, you have chosen an incompatible alignment.
来源:https://stackoverflow.com/questions/31986116/equivalent-of-memalign-in-cuda