Can CAS fail for all threads?

非 Y 不嫁゛ 提交于 2019-12-10 18:39:04

问题


I'm reading about [lock cmpxchg description]) https://www.felixcloutier.com/x86/CMPXCHG.html):

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

Now consider two threads executing lock cmpxchg:

Thread 1                Thread 2
mov ebx, 0x4000                mov ebx, 0x4000 ; address
mov edx, 0x62ab6                mov edx, 0x62ab8 ;  new val
mov eax, 0x62ab1                mov eax, 0x62ab1 ;  old
lock cmpxchg [ebx], eax                lock cmpxchg [ebx], eax ;  <----- here

The question is can both lock'ed cmpxchg in Thread 1 and Thread 2 fail?

Since

the destination operand receives a write cycle without regard to the result of the comparison

I could guess that both of the threads can have the write cycle and than both of them can be reverted because of comparing to a stale value... But I'm not sure if this is correct.

Maybe I need to look at the cas implementation details, but it is not specified in the intel instruction reference (At least I could not find)


回答1:


My understanding is that lock cmpxchg cannot fail spuriously - unlike LL/SC - assuming the value at the memory address indeed matches. It builds those guarantees from the cache coherency protocol by taking exclusive ownership of the cache line and not yielding it to other cores until the operation is done.

So CAS can only fail for all threads if some other thread wrote to the memory location.




回答2:


@the8472's answer is correct, but I wanted to add an alternate answer.

https://www.felixcloutier.com/x86/CMPXCHG.html already specifies the behaviour in enough detail to rule out the possibility of spurious failure. If it could fail for some reason other than the value in memory not matching eax, the docs would have to say so.

You can also note the fact that compilers use a single lock cmpxchg for C++11 std::atomic::compare_exchange_strong, from which you can conclude that compiler writers think lock cmpxchg can't spuriously fail.

#include <atomic>

bool cas_bool(std::atomic_int *a, int expect, int want) {
    return a->compare_exchange_strong(expect, want);
}

compiles to (gcc7.3 -O3):

cas_bool(std::atomic<int>*, int, int):
    mov     eax, esi
    lock cmpxchg    DWORD PTR [rdi], edx
    sete    al
    ret

See also Can num++ be atomic for 'int num'? for more details of how locked instructions are implemented internally, and how they interact with MESI. (i.e. @the8472's answer is the short version: for an operand that doesn't cross a cache line, a core just hangs onto that cache line so nothing else in the system can read or write it for the duration of the lock cmpxchg).


the destination operand receives a write cycle without regard to the result of the comparison

The read + write pair are atomic with respect to all other observers in the system. The ordering you propose, of read1 / read2 / write1 / abort write2 is impossible because lock cmpxchg is atomic, so read2 can't appear between read1 and write1 in the global order.

Also, that language only applies to the external memory bus. Modern CPUs with integrated memory controllers can do whatever they want (for lock cmpxchg on an address that's split across two cache lines). Intel may publish documentation for motherboard vendors to use in their internal testing of signals on the memory bus.

That documentation might still be relevant for lock cmpxchg on an MMIO address, but definitely not for an aligned operand in write-back memory. In that case, it's just a cache lock. (And it's a hidden implementation detail whether the L1d cache is written or not when the compare fails). I guess you could test this by seeing if it dirties the cache line (i.e. puts it in Modified state instead of Exclusive).

For more discussion about how lock cmpxchg might work internally vs. xchg, see the chat thread between me and @BeeOnRope following my answer on Exit critical region. (Mostly me having ideas that could work in theory, but are incompatible with what we know about Intel x86 CPUs, and @BeeOnRope pointing out my mistakes. https://chat.stackoverflow.com/transcript/message/42472667#42472667. There's very little we can conclude for sure about the fine details of efficiency of xchg vs. lock cmpxchg. It's certainly possible that xchg keeps the cache line locked for fewer cycles than lock cmpxchg, but that needs to be tested. I think xchg has better latency if used back-to-back on the same location from a single thread, though.)



来源:https://stackoverflow.com/questions/50268929/can-cas-fail-for-all-threads

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!