Passing Class to a Kernel in Intel Opencl

亡梦爱人 提交于 2019-12-04 21:22:54

OpenCL uses C99. So you can pass structs, but not classes, to the kernel.

As huseyin tugrul buyukisik says, you can use SYCL, which supports c++14 (or thereabouts).

Alternatively, if you want to support both NVIDIA® CUDA™ and OpenCL, you could write it only in NVIDIA® CUDA™, and then use https://github.com/hughperkins/cuda-on-cl to run the NVIDIA® CUDA™ code on OpenCL 1.2 GPU devices. Full disclosure: I'm the author of cuda-on-cl, and it's a bit of a work-in-progress for now. It does work though, with some caveats/limitations. It can handle full-blown C++11, templates, classes etc. For example, it can be used to compile and run Eigen on OpenCL 1.2 GPUs https://bitbucket.org/hughperkins/eigen/src/eigen-cl/unsupported/test/cuda-on-cl/?at=eigen-cl

If sycl(and Hugh Perkins's nice solution) is not option for you and if your class doesn't have any methods, you can use structs instead(serialize to byte array when copying to device):

typedef struct Warrior_tag
{
    int id;
    float hp;
    int strength;
    int dexterity;
    int constitution;
} Warrior;

typedef struct Mage_tag
{
    int id;
    Warrior summoned_warriors[90];
} Mage; 
// should be more than 32*90 + 32*90 => 5760(2.8k *2) => 8192(4k*2) bytes
// because id + padding = 90 warriors or it doesn't work in runtime
// reversing order of fields should make it 4k + 4 bytes


__kernel void test0(__global Warrior * warriors)
{
    int id=get_global_id(0);
    Warrior battal_gazi = warriors[0];
    Warrior achilles = warriors[1];
    Warrior aramis = warriors[2];
    Warrior hattori_hanzo = warriors[3];
    Warrior ip_man = warriors[4];

    Mage nakano = (Mage){0,{battal_gazi, achilles}};
    Mage gul_dan = (Mage){0,{aramis , hattori_hanzo,ip_man  }};
}

and then you are responsible for handling of alignment and sizes of structs. For example, Warrior struct has fields totally 20 bytes but it is likely 32 bytes in device side(because of some rules forcing it being power of 2 in memory) and you should acknowledge it from host side and put data accordingly in tune with alignment and variable sizes. Not even mentioning endianness which is a pain to handle for "write once, run everywhere". So you should run it only in your computer which is optimized for.

Pack biggest fields on top of struct, add smaller ones in bottom. Calculate their in-struct alignment as powers of 2 too!. Keep an eye of float3, int3 and similar not-so-documented-well implementations as they may or may not use float4,int4 in background. If performance of global memory access is not important for you, you can simply select a big number like N for every struct smaller than that for simplicity and put relative byte addressings to a structs beginning byte. Such as byte address of hp field in a Warrior struct at top(in a packed 4-bytes into single int). Then device side it can be queried to which byte does a field start. (endianness can make it more trickier so don't use buffercopies for pure structs)

If alignment of struct fields in host side is not an option:

  • send arrays of fields to a constructor kernel(float array -> hp, int array -> id)
  • construct in device using kernel (a buffer only on device side, Warrior is build from arrays of hp,id,...)
  • don't fiddle with alignments nor sizes anymore, just make buffer large enough to fit all structs inside. Picking 32 * number of warriors bytes should be enough for a warrior array.
  • when it works, return results as arrays again, to host side, after using another kernel to expand struct to arrays on device side.
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!