My recent efforts to implement a thread/ mutex manager ended up in an 75% CPU load (4 core), while all four running threads were either in sleep or waiting for a mutex beein
As the documentation says, [t]he objects are locked by an unspecified series of calls to lock, try_lock, unlock. There is simply no way that can possibly be efficient if the mutexes are held by other threads for a significant period of time. There's no way the function can wait without spinning.