spinlock

How does x86 pause instruction work in spinlock *and* can it be used in other scenarios?

﹥>﹥吖頭↗ 提交于 2019-11-28 04:40:21
pause instruction is commonly used in the loop of testing spinlock , when some other thread owns the spinlock, to mitigate the tight loop. It's said that it is equivalent to some NOP instructions. Could somebody tell me how exactly it works for spinlock optimization? It seems to me that even the NOP instructions are a waste of CPU time. Will they decrease CPU usage? Another question is that could I use pause instruction for other similar purposes. For example, I have a busy thread which keeps scanning some places (e.g. a queue) to retrieve new nodes; however, sometimes the queue is empty and

x86 spinlock using cmpxchg

…衆ロ難τιáo~ 提交于 2019-11-27 20:22:56
问题 I'm new to using gcc inline assembly, and was wondering if, on an x86 multi-core machine, a spinlock (without race conditions) could be implemented as (using AT&T syntax): spin_lock: mov 0 eax lock cmpxchg 1 [lock_addr] jnz spin_lock ret spin_unlock: lock mov 0 [lock_addr] ret 回答1: You have the right idea, but your asm is broken: cmpxchg can't work with an immediate operand, only registers. lock is not a valid prefix for mov . mov to an aligned address is atomic on x86, so you don't need lock

What exactly are “spin-locks”?

混江龙づ霸主 提交于 2019-11-27 19:41:02
问题 I always wondered what they are: every time I hear about them, images of futuristic flywheel-like devices go dancing (rolling?) through my mind... What are they? 回答1: When you use regular locks (mutexes, critical sections etc), operating system puts your thread in the WAIT state and preempts it by scheduling other threads on the same core. This has a performance penalty if the wait time is really short, because your thread now has to wait for a preemption to receive CPU time again. Besides,

Do spin locks always require a memory barrier? Is spinning on a memory barrier expensive?

大兔子大兔子 提交于 2019-11-27 15:16:19
问题 I wrote some lock-free code that works fine with local reads, under most conditions. Does local spinning on a memory read necessarily imply I have to ALWAYS insert a memory barrier before the spinning read? (To validate this, I managed to produce a reader/writer combination which results in a reader never seeing the written value, under certain very specific conditions--dedicated CPU, process attached to CPU, optimizer turned all the way up, no other work done in the loop--so the arrows do

Fastest inline-assembly spinlock

橙三吉。 提交于 2019-11-27 07:17:01
I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores

Fastest inline-assembly spinlock

孤街醉人 提交于 2019-11-26 22:16:26
问题 I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units

GLSL per-pixel spinlock using imageAtomicCompSwap

℡╲_俬逩灬. 提交于 2019-11-26 14:54:07
问题 OpenGL red book version 9 (OpenGL 4.5) example 11.13 is Simple Per-Pixel Mutex . It uses imageAtomicCompSwap in a do {} while() loop to take a per-pixel lock to prevent simultaneous access to a shared resouce between pixel shader invocations corresponding to the same pixel coordinate. layout (binding = 0, r32ui) uniform volatile coherent uimage2D lock_image; void main(void) { ivec2 pos = ivec2(gl_FragCoord.xy); // spinlock - acquire uint lock_available; do { lock_available =

When should one use a spinlock instead of mutex?

我与影子孤独终老i 提交于 2019-11-26 03:45:56
问题 I think both are doing the same job,how do you decide which one to use for synchronization? 回答1: The Theory In theory, when a thread tries to lock a mutex and it does not succeed, because the mutex is already locked, it will go to sleep, immediately allowing another thread to run. It will continue to sleep until being woken up, which will be the case once the mutex is being unlocked by whatever thread was holding the lock before. When a thread tries to lock a spinlock and it does not succeed,