Prefetch in cuda (through C code)

后端 未结 2 1539
眼角桃花
眼角桃花 2021-01-02 11:48

I am working on data prefetch in CUDA (Fermi GPU) through C code. Cuda reference manual talks about the prefetching at ptx level code not at C level code.

Can anyon

2条回答
  •  心在旅途
    2021-01-02 12:36

    According to PTX manual here is how prefetch works in PTX:

    enter image description here

    You can embed the PTX instructions into the CUDA kernel. Here is a tiny sample from NVIDIA's documentation:

    __device__ int cube (int x)
    {
      int y;
      asm("{\n\t"                       // use braces for local scope
          " .reg .u32 t1;\n\t"           // temp reg t1,
          " mul.lo.u32 t1, %1, %1;\n\t" // t1 = x * x
          " mul.lo.u32 %0, t1, %1;\n\t" // y = t1 * x
          "}"
          : "=r"(y) : "r" (x));
      return y;
    }
    

    You may come to conclude with the following prefetch function in C:

    __device__ void prefetch_l1 (unsigned int addr)
    {
    
      asm(" prefetch.global.L1 [ %1 ];": "=r"(addr) : "r"(addr));
    }
    

    NOTICE: You need the GPU of Compute Capability 2.0 or higher for prefetch. Pass proper compile flags accordingly -arch=sm_20

提交回复
热议问题