Is coalescing triggered for accessing memory in reverse order?

╄→гoц情女王★ 提交于 2019-12-01 07:13:58

问题


Let's say I have several threads and they access memory at addresses A+0, A+4, A+8, A+12 (each access = next thread). Such access is coalesced, right?

However if I have access the same memory but in reverse order, meaning:

thread 0 -> A+12
thread 1 -> A+8
thread 2 -> A+4
thread 3 -> A+0

Is coalescing here also triggered?


回答1:


Yes, for cc 2.0 and newer GPUs, coalescing will occur for any random arrangement of 32 bit data elements to threads, as long as all the requested 32-bit data elements are coming from (requested from) the same 128 byte (and 128 byte aligned) region in global memory.

The GPU has something like a "crossbar switch" in the memory controller that will distribute elements as needed. You may be interested in this GPU webinar which discusses coalescing and will illustrate this particular case pictorially (on slide 12).

The NVIDIA webinar page has other useful webinars you may be interested in as well.

For pre-cc2.0 devices the specifics vary by compute capability, but compute 1.0 and 1.1 capable devices do not have this ability to coalesce reads that are in "reverse order" or random order.




回答2:


It's also worth noting that a main purpose of the L2 cache in an Nvidia GPU is to collapse reads and coalesce writes. So if one warp was accessing

thread 0 -> A+0
thread 1 -> A+8
thread 2 -> A+16
thread 3 -> A+24
...

and another warp was accessing

thread 0 -> A+4
thread 1 -> A+12
thread 2 -> A+20
thread 3 -> A+28
...

these two accesses will not coalesce inside the SM but generally will coalesce in the L2 cache, so that GPU memory will only be touched once.



来源:https://stackoverflow.com/questions/15029765/is-coalescing-triggered-for-accessing-memory-in-reverse-order

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!