cuda kernel for add(a,b,c) using texture objects for a & b - works correctly for 'increment operation' add(a,b,a)?

三世轮回 提交于 2020-01-25 11:31:46

问题


I want to implement a cuda function 'add(a,b,c)' for adding (component-wise) two one-channel floating-point images 'a' and 'b' together and storing the result in the floating-point image 'c'. So 'c = a + b'. The function will be implemented by first binding texture objects 'aTex' and 'bTex' to the pitch-linear images 'a' and 'b', and then accessing the image 'a' and 'b' inside the kernel only via the texture objects 'aTex' and 'bTex'. The sum is stored in 'c' via a simple write to global memory. What happens now if I call the function for incrementing 'a' by 'b' - so I call 'add(a,b,a)' ? Because now, the image 'a' is used in the kernel on two places - from 'a' I read in the value via the texture object 'aTex', and I also store values in 'a' via the write to global memory. Is it possible that this usage of the 'add' function leads to incorrect results ?


回答1:


The GPU's texture is not coherent. This means that a global memory write to a particular location of the global memory underlying a texture may or may not be reflected during a subsequent texture access to that same location. So there is a read-after-write hazard in such a scenario.

If, however, the code performs a global memory write to a particular location of the global memory underlying a texture, and that location subsequently is never read from via the texture during the lifetime of the kernel, there is no read-after-write hazard, and the code will behave as expected: The updated data in global memory can be accessed by a subsequent kernel in any manner desired, including texture access, as the texture cache is cleared upon a kernel launch.

I have personally used this approach to speed up in-place operations with small strides as the texture read path provided higher load performance. An example would be the BLAS-1 operation [D|S|Z|C]SCAL in CUBLAS, which scales each array element by a scalar.



来源:https://stackoverflow.com/questions/26422082/cuda-kernel-for-adda-b-c-using-texture-objects-for-a-b-works-correctly-for

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!