I have implemented a depth peeling algorithm using a GLSL spinlock (inspired by this). In the following visualization, notice how overall the depth peeling algorithm functi
For reference, here is locking code that has been tested to work on Nvidia driver 314.22 & 320.18 on a GTX670. Note that existing compiler optimization bugs are triggered if the code is reordered or rewritten to logically equivalent code (see comments below.) Note in the below I use bindless image references.
// sem is initialized to zero
coherent uniform layout(size1x32) uimage2D sem;
void main(void)
{
ivec2 coord = ivec2(gl_FragCoord.xy);
bool done = false;
uint locked = 0;
while(!done)
{
// locked = imageAtomicCompSwap(sem, coord, 0u, 1u); will NOT work
locked = imageAtomicExchange(sem, coord, 1u);
if (locked == 0)
{
performYourCriticalSection();
memoryBarrier();
imageAtomicExchange(sem, coord, 0u);
// replacing this with a break will NOT work
done = true;
}
}
discard;
}
The "imageAtomicExchange(img2D_0,coord,0);" needs to be inside the if statement, since it is resetting the lock variable even for threads that didn't have it! Changing this fixes it.