I have 3 processes which need to be synchronized. Process one does something then wakes process two and sleeps, which does something then wakes process three and sleeps, which d
(Sorry to give a second answer but this one would be too messy to clean up just with editing)
The answer is, I think, already in the original post for the question.
So, my question is why does sem3 timeout, even though the semaphore has been triggered and the value is clearly 1? I would never expect to see line 08 in the output. If it times out (because, say thread 2 has crashed or is taking too long), the value should be 0. And why does it work fine for 3 or 4 hours first before getting into this state?
So the scenario is:
sem_timedwaitsem_getvaluesem_post on sem3sem_getvalue
and sees a 1sem_post
on sem1This race condition is hard to trigger, basically you have to hit the tiny time window where one thread has had a problem in waiting for the semaphore and then reads the semaphore with the sem_getvalue. The occurrence of that condition is much dependent of the environment (type of system, number of cores, load, IO interrupts) so this explains why it only occurs after hours, if not at all.
Having the control flow depend of a sem_getvalue is generally a bad idea. The only atomic non-blocking access to a sem_t is through sem_post and sem_trywait.
So this example code from the question has that race condition. This doesn't mean that the original problem code that gillez had, does indeed have the same race condition. Perhaps the example is just too simplistic, and still shows the same phenomenon for him.
My guess is, in his original problem there was an unprotected sem_wait. That is a sem_wait that is only checked for its return value and not for errno in the event that it fails. EINTRs do occur on sem_wait quite naturally if the process has some IO. You have just do a do - while with check and reset of errno if you encounter a EINTR.