If I set 3 threads to wait for a mutex to be release, do they form a queue based on the order they requested it in or is it undefined behaviour (i.e. we don\'t know which on
The Mutex Object is mostly fair. The APC case can occur but it is not that common. Especially if the thread is not doing I/O or is doing I/O using completion ports or synchronously.
Most of the Windows user-mode locks (SRWLock, CriticalSection) are unfair if you can acquire them without blocking but fair if you have to block in the kernel. The reason it is done this way is to avoid lock convoys. The moment a fair lock becomes contended, every acquirer has to go through the scheduler and the context switch path before getting the lock. No one can 'skip ahead' and just take the lock because they happen to be running. Thus the lock acquire time for the last thread in the queue increases by the scheduling and context switch time for each prior thread in the queue. The system does not recover from this state until external load is mostly removed because this is a stable condition.
For performance, I would recommend using one of the aforementioned user-mode locks since they are much faster than a kernel mutex, if they fit into your scenario.