问题
__kernel void example(__global int *a, __global int *dependency, uint cols) { int j = genter code hereet_global_id(0); int i = get_global_id(1); if(i > 0 && j > 0) { while(1) { test = 1; } //Wait for the dependents ----------------------------- -------------------------- } }
In the above kernel code why the while loop is just skipped in all the threads with out infinitely looping. Any ideas on this. I'm working on some interesting problem which requires a thread to wait for some other threads to finish based on some criteria but every time while of above or while(wait_condition) is skipped when it is being run on GPU.
Is there any other way of making a particular thread to wait for the other threads in OpenCL kernel on GPU?
Thanks in advance!
回答1:
At the high level, GPUs are data parallel computing devices. They like to run the same task on different data. They don't do well when their tasks do different things.
Your code is illustrative of a task parallel problem. So my high level question is what type of problem are you solving.? If it's a a task parallel problem then perhaps a GPU isn't the best solution. Would a multi-core CPU be an alternative?
You code is a typical of a 'spinlock'. Where the code loops until a value changes. Its often used for short term light weight locking in databases. This is dangerous code even on a CPU, as a mistake or error can lockup the CPU or GPU. For CPU code, a spinlock is usually protected with a interrupt timer. The usage is
1) set a timer 2) spin until a value changes 3) continue or time-out
So after the requisite number of ms the code is interrupted and an error is thrown. So if you use the spinlock pattern, for safety, add a loop exit in the while statement after a suitable number of loops have been completed.
In OpenCL reduction algorithms, its typical for the zero thread (get_global_id(0) == 0) to return the final singleton result. Prior to this all threads would been synchronized using a barrier call
__kernel
void mytask( ... , global float * result )
{
int thread = get_global_id(0);
... your code
barrier( CLK_LOCAL_MEM_FENCE | CLK_GLOBAL_MEM_FENCE ) // flush global and local variables or enqueue a memory fence see OpenCL spec for details
if ( thread == 0) // Zero thread
result[0] = value; // set the singleton result as the zeroth array element
}
来源:https://stackoverflow.com/questions/9689329/is-there-any-way-of-making-a-particular-thread-to-wait-for-other-threads-upon-so