OpenCL buffer allocation and mapping best practice

萝らか妹 提交于 2020-01-16 07:41:13

问题


I am a little confused as to whether my code using OpenCL mapped buffers are correct.

I have two examples, one using CL_MEM_USE_HOST_PTR and one using CL_MEM_ALLOC_HOST_PTR. Both work and run on my local machine and OpenCL devices but I am interested in whether this is the correct way of doing the mapping, and whether it should work an all OpenCL devices. I am especially unsure about the USE_HOST_PTR example.

I am only interested in the buffer/map specific operations. I am aware I should do error checking and so forth.

CL_MEM_ALLOC_HOST_PTR:

// pointer to hold the result
int * host_ptr = malloc(size * sizeof(int));

d_mem = clCreateBuffer(context,CL_MEM_READ_WRITE|CL_MEM_ALLOC_HOST_PTR,
                       size*sizeof(cl_int), NULL, &ret);

int * map_ptr = clEnqueueMapBuffer(command_queue,d_mem,CL_TRUE,CL_MAP_WRITE,
                                   0,size*sizeof(int),0,NULL,NULL,&ret);
// initialize data
for (i=0; i<size;i++) {
  map_ptr[i] = i;
}

ret = clEnqueueUnmapMemObject(command_queue,d_mem,map_ptr,0,NULL,NULL); 

//Set OpenCL Kernel Parameters
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&d_mem);

size_t global_work[1]  = { size };
//Execute OpenCL Kernel
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, 
                             global_work, NULL, 0, 0, NULL);

map_ptr = clEnqueueMapBuffer(command_queue,d_mem,CL_TRUE,CL_MAP_READ,
                             0,size*sizeof(int),0,NULL,NULL,&ret);
// copy the data to result array 
for (i=0; i<size;i++){
  host_ptr[i] = map_ptr[i];
} 

ret = clEnqueueUnmapMemObject(command_queue,d_mem,map_ptr,0,NULL,NULL);        

// cl finish etc     

CL_MEM_USE_HOST_PTR:

// pointer to hold the result
int * host_ptr = malloc(size * sizeof(int));
int i;
for(i=0; i<size;i++) {
  host_ptr[i] = i;
}

d_mem = clCreateBuffer(context,CL_MEM_READ_WRITE|CL_MEM_USE_HOST_PTR,
                       size*sizeof(cl_int), host_ptr, &ret);

// No need to map or unmap here, as we use the HOST_PTR the original data
// is already initialized into the buffer?

//Set OpenCL Kernel Parameters
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&d_mem);

size_t global_work[1]  = { size };
//Execute OpenCL Kernel
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL, 
                             global_work, NULL, 0, 0, NULL);

// this returns the host_ptr so need need to save it (I assume it always will?)
// although we do need to call the map function
// to ensure the data is copied back.
// There's no need to manually copy it back into host_ptr
// as it uses this by default
clEnqueueMapBuffer(command_queue,d_mem,CL_TRUE,CL_MAP_READ,
                   0,size*sizeof(int),0,NULL,NULL,&ret); 

ret = clEnqueueUnmapMemObject(command_queue,d_mem,map_ptr,0,NULL,NULL);        

// cl finish, cleanup etc

回答1:


If you use CL_MEM_ALLOC_HOST_PTR you have the chance that the underlying implementation of OpenCL might use page-locked memory.

That means that the page cannot be swapped out to disk and that the transfer between host and device memory would be done DMA style without wasting CPU cycles. Therefore in this case CL_MEM_ALLOC_HOST_PTR would be the best solution.

nVidia has the page-locked (pinned) memory feature and they should also use it in their OpenCL implementation. For AMD it's not certain if they do the same. Check here for more details.

Using CL_MEM_USE_HOST_PTR would just make the programmer's life easier so in the unlikely case when the hardware cannot use page-locked memory you could just use this option.



来源:https://stackoverflow.com/questions/26277268/opencl-buffer-allocation-and-mapping-best-practice

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!