Using `overlap`, `kernel time` and `utilization` to optimize one's kernels

女生的网名这么多〃 提交于 2019-12-12 02:53:40

问题


My kernel archive 100% utilization, but the kernel time is at only 3% and there is no time overlap between memory copies and kernels.

Especially the high utilization and the low kernel time don't make sense to me.

So how should I proceed in optimizing my kernel?

I already made sure, that I only have coalesced and pinned memory access, like the profiler recommended.

`Quadro FX 580 utilization = 100.00% (62117.00/62117.00)`

Kernel time = 3.05 % of total GPU time 
Memory copy time = 0.9 % of total GPU time
Kernel taking maximum time = Pinned (0.7% of total GPU time)
Memory copy taking maximum time = memcpyHtoD (0.5% of total GPU time)
There is no time overlap between memory copies and kernels on GPU

Furtermore I have no warp serialization, no divergent branches, and no occupancy limiting factor.

Kernel details: Grid size: [4 1 1], Block size: [256 1 1]
Register Ratio: 0.9375 ( 7680 / 8192 ) [10 registers per thread]
Shared Memory Ratio: 0.09375 ( 1536 / 16384 ) [60 bytes per Block]
Active Blocks per SM: 3 (Maximum Active Blocks per SM: 8)
Active threads per SM: 768 (Maximum Active threads per SM: 768)
Potential Occupancy: 1 ( 24 / 24 )
Achieved occupancy: 0.333333 (on 4 SMs)
Occupancy limiting factor: None

p.s. I don't claim that I wrote wundercode, but I just don't know how to proceed from here.


回答1:


it seems the grid size of your kernel is too small to make full use of SM. why not decrease block size and increase the grid size. i think it will do some help.



来源:https://stackoverflow.com/questions/7839428/using-overlap-kernel-time-and-utilization-to-optimize-ones-kernels

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!