Ideas for CUDA kernel calls with parameters exceeding 256 bytes

核能气质少年 提交于 2019-12-10 17:26:58

问题


I have a couple of structures that summed up exceed the 256 bytes size allowed to be passed as parameters in a kernel call.

Both structures are already allocated and copied to device global memory.

1) How can I make use in the same kernel of these structures without being passed as parameters?

More details. Separately, these structures can be passed as parameters. For example, in different kernels. But:

2) How can I use both structures in the same kernel?


回答1:


As Robert Crovella suggested in his comment, you should just be able to pass a pointer to those areas. I have had similar problem in opencl.. This is how I implemented the struct:

(My kernel and host functions are in opencl, syntax can be the issue for you..but the context is same.!)

Following two are defined in my 'Mapper.c'--> Host function

typedef struct data
{
  double dattr[10];
  int d_id;
  int bestCent;
}Data;


typedef struct cent
{
  double cattr[5];
  int c_id;
}Cent;

Data *dataNode;
Cent *centNode;

After allocating memory on Device's global memory, I transferred the data. I had to redefine the struct definitions in my other kernel function as below:

mapper.cl:

#pragma OPENCL EXTENSION cl_khr_fp64 : enable
typedef struct data
{
  double dattr[10];
  int d_id;
  int bestCent;
}Data;


typedef struct cent
{
  double cattr[5];
  int c_id;
}Cent;

__kernel void mapper(__global int *keyMobj, __global int *valueMobj,__global Data *dataMobj,__global Cent *centMobj)
{
    int i= get_global_id(0);
    int j,k,color=0;
    double dmin=1000000.0, dx;
    for(j=0; j<2; j++)      //here 2 is number of centroids considered
     {
        dx = 0.0;
        for(k=0; k<2; k++)
        {
           dx+= ((centMobj[j].cattr[k])-(dataMobj[i].dattr[k])) * ((centMobj[j].cattr[k])-(dataMobj[i].dattr[k]));
        }  
        if(dx<dmin)            
        {  color = j;   
           dmin = dx;
        }
     }  
     keyMobj[i] = color;
     valueMobj[i] = dataMobj[i].d_id;

}

You can see that I have passed only pointer to those areas.. i.e. keyMobj and valueMobj.

kernel = clCreateKernel(program, "mapper", &ret);
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&keyMobj);
ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&valueMobj);
ret = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&dataMobj);
ret = clSetKernelArg(kernel, 3, sizeof(cl_mem), (void *)&centMobj);

Above lines of code is belongs to host side function(mapper.c) which creates kernel function(mapper.cl)..and next 4 lines (clSetKernelArg..) passes the arguments to the kernel function.




回答2:


If your data structures are already in global memory, then you can just pass a pointer in as the kernel argument.

On a related note, the limit for kernel arguments is 4KB for devices of compute capability 2.x and higher:

global function parameters are passed to the device:

  • via shared memory and are limited to 256 bytes on devices of compute capability 1.x,
  • via constant memory and are limited to 4 KB on devices of compute capability 2.x and higher.

device and global functions cannot have a variable number of arguments.

(c.f. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#function-parameters)



来源:https://stackoverflow.com/questions/21895167/ideas-for-cuda-kernel-calls-with-parameters-exceeding-256-bytes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!