In a distributed job system written in C++11 I have implemented a fence (i.e. a thread outside the worker thread pool may ask to block until all currently scheduled jobs are
Your diagnose is correct, this code is prone to lose condition notifications in the way you described. I.e. after one thread locked the mutex but before waiting on the condition variable another thread may call notify_all() so that the first thread misses that notification.
A simple fix is to lock the mutex before decrementing the counter and while notifying:
void task_pool::fence_impl(void *arg)
{
auto f = static_cast(arg);
std::unique_lock lock(f->resume_mutex);
if (--f->counter == 0) {
f->resume.notify_all();
}
else do {
f->resume.wait(lock);
} while(f->counter);
}
In this case the counter need not be atomic.
An added bonus (or penalty, depending on the point of view) of locking the mutex before notifying is (from here):
The pthread_cond_broadcast() or pthread_cond_signal() functions may be called by a thread whether or not it currently owns the mutex that threads calling pthread_cond_wait() or pthread_cond_timedwait() have associated with the condition variable during their waits; however, if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal().
Regarding the while loop (from here):
Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.