In a distributed job system written in C++11 I have implemented a fence (i.e. a thread outside the worker thread pool may ask to block until all currently scheduled jobs are
In order to keep the higher performance of an atomic operation instead of a full mutex, you should change the wait condition into a lock, check and loop.
All condition waits should be done in that way. The condition variable even has a 2nd argument to wait which is a predicate function or lambda.
The code might look like:
void task_pool::fence_impl(void *arg)
{
auto f = (fence *)arg;
if (--f->counter == 0) // (1)
// we have zeroed this fence's counter, wake up everyone that waits
f->resume.notify_all(); // (2)
else
{
unique_lock lock(f->resume_mutex);
while(f->counter) {
f->resume.wait(lock); // (3)
}
}
}