Fermi L2 cache hit latency?

独自空忆成欢 提交于 2019-12-09 00:57:39

问题


Does anyone know related information about L2 cache in Fermi? I have heard that it is as slow as global memory, and the use of L2 is just to enlarge the memory bandwidth. But I can't find any official source to confirm this. Did anyone measure the hit latency of L2? What about size, line size, and other paramters?

In effect, how do L2 read misses affect the performance? In my sense, L2 only has a meaning in very memory-bound applications. Please feel free to give your opinions.

Thanks


回答1:


This thread in the nvidia has some measurements for performance characteristica. While it is not official information, and probably not 100% exact, it gives at least some indication for the behaviour, so I thought it might be useful here (measurements in clockcycles):

1020 non-cached (L1 enabled but not used)

1020 non-cached (L1 disabled)

365 L2 cached (L1 disabled)

88 L1 cached (L1 enabled and used)

Another post in the same thread gives those results:

1060 non-cached

248 L2

18 L1




回答2:


It is not just as slow as global memory. I don't have a source explicitly saying that but on the CUDA programming guide it says "A cache line request is serviced at the throughput of L1 or L2 cache in case of a cache hit, or at the throughput of device memory, otherwise." so they should be different for this to make any sense and why would NVIDIA put a cache with the same speed of global memory? It would be worse on average because of cache misses.

About the latency I don't know. The size of the L2 cache is 768KB, the line size is 128 bytes. Section F4 of the CUDA programming guide has some more bits of information, specially section F4.1 and F4.2. The guide is available here http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf



来源:https://stackoverflow.com/questions/6744101/fermi-l2-cache-hit-latency

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!