Futex throughput on Linux

♀尐吖头ヾ 提交于 2020-02-28 07:03:41

问题


I have an async API which wraps some IO library. The library uses C style callbacks, the API is C++, so natural choice (IMHO) was to use std::future/std::promise to build this API. Something like std::future<void> Read(uint64_t addr, byte* buff, uint64_t buffSize). However, when I was testing the implementation I saw that the bottleneck is the future/promise, more precisely, the futex used to implement promise/future. Since the futex, AFAIK, is user space and the fastest mechanism I know to sync two threads, I just switched to use raw futexes, which somewhat improved the situation, but not something drastic. The performance floating somewhere around 200k futex WAKEs per second. Then I stumbled upon this article - Futex Scaling for Multi-core Systems which quite matches the effect I observe with futexes. My questions is, since the futex too slow for me, what is the fastest mechanism on Linux I can use to wake the waiting side. I dont need anything more sophisticated than binary semaphore, just to signal IO operation completion. Since IO operations are very fast (tens of microseconds) switching to kernel mode not an option. Busy wait not an option too, since CPU time is precious in my case.

Bottom line, user space, simple synchronization primitive, shared between two threads only, only one thread sets the completion, only one thread waits for completion.

EDIT001: What if... Previously I said, no spinning in busy wait. But futex already spins in busy wait, right? But the implementation covers more general case, which requests global hash table, to hold the futexes, queues for all subscribers etc. Is it a good idea to mimic same behavior on some simple entity (like int), no locks, no atomics, no global datastructures and busy wait on it like futex already does?

来源:https://stackoverflow.com/questions/49629928/futex-throughput-on-linux

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!