Simplest Possible Example to Show GPU Outperform CPU Using CUDA

后端 未结 4 747
悲哀的现实
悲哀的现实 2020-12-28 15:24

I am looking for the most concise amount of code possible that can be coded both for a CPU (using g++) and a GPU (using nvcc) for which the GPU consistently outperforms the

4条回答
  •  灰色年华
    2020-12-28 15:45

    First off, I'll reiterate my comment: GPUs are high bandwidth, high latency. Trying to get the GPU to beat a CPU for a nanosecond job (or even a millisecond or second job) is completely missing the point of doing GPU stuff. Below is some simple code, but to really appreciate the performance benefits of GPU, you'll need a big problem size to amortize the startup costs over... otherwise, it's meaningless. I can beat a Ferrari in a two foot race, simply because it take some time to turn the key, start the engine and push the pedal. That doesn't mean I'm faster than the Ferrari in any meaningful way.

    Use something like this in C++:

      #define N (1024*1024)
      #define M (1000000)
      int main()
      {
         float data[N]; int count = 0;
         for(int i = 0; i < N; i++)
         {
            data[i] = 1.0f * i / N;
            for(int j = 0; j < M; j++)
            {
               data[i] = data[i] * data[i] - 0.25f;
            }
         }
         int sel;
         printf("Enter an index: ");
         scanf("%d", &sel);
         printf("data[%d] = %f\n", sel, data[sel]);
      }
    

    Use something like this in CUDA/C:

      #define N (1024*1024)
      #define M (1000000)
    
      __global__ void cudakernel(float *buf)
      {
         int i = threadIdx.x + blockIdx.x * blockDim.x;
         buf[i] = 1.0f * i / N;
         for(int j = 0; j < M; j++)
            buf[i] = buf[i] * buf[i] - 0.25f;
      }
    
      int main()
      {
         float data[N]; int count = 0;
         float *d_data;
         cudaMalloc(&d_data, N * sizeof(float));
         cudakernel<<>>(d_data);
         cudaMemcpy(data, d_data, N * sizeof(float), cudaMemcpyDeviceToHost);
         cudaFree(d_data); 
    
         int sel;
         printf("Enter an index: ");
         scanf("%d", &sel);
         printf("data[%d] = %f\n", sel, data[sel]);
      }
    

    If that doesn't work, try making N and M bigger, or changing 256 to 128 or 512.

提交回复
热议问题