How do I atomically move a 64bit value in x86 ASM?

雨燕双飞 提交于 2020-06-27 12:59:47

问题


First, I found this question: How do I atomically read a value in x86 ASM? But its a bit different, in my case I want to atomically assign a float (64bit double) value in a 32bit application.

From: "Intel® 64 and IA-32 ArchitecturesSoftware Developer’s Manual, Volume3A"

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

Reading or writing a quadword aligned on a 64-bit boundary

Is it actually possible using some assembly trick?


回答1:


In 64-bit x86 asm, you can use integer mov rax, [rsi], or x87 or SSE2. As long as the address is 8-byte aligned (or on Intel p6 and later CPUs: doesn't cross a cache-line boundary) the load or store will be atomic.


In 32-bit x86 asm, your only option using only integer registers is lock cmpxchg8b, but that sucks for a pure-load or pure-store. (You can use it as a load by setting expected=desired = 0, except on read-only memory though). (gcc/clang use lock cmpxchg16b for atomic<struct_16_bytes> in 64-bit mode, but some compilers simply choose to make 16-byte objects not lock-free.)

So the answer is: don't use integer regs: fild qword / fistp qword can copy any bit-pattern without changing it. (As long as the x87 precision control is set to full 64-bit mantissa). This is atomic for aligned addresses on Pentium and later.

On a modern x86, use SSE2 movq load or store. e.g.

; atomically store edx:eax to qword [edi], assuming [edi] is 8-byte aligned
movd   xmm0, eax
pinsrd xmm0, edx            ; SSE4.1
movq   [edi], xmm0

With only SSE1 available, use movlps. (For loads, you may want to break the false-dependency on the old value of the xmm register with xorps).

With MMX, movq to/from mm0-7 works.


gcc uses SSE2 movq, SSE1 movlps, or x87 fild/fstp in that order of preference for std::atomic<int64_t> in 32-bit mode. Clang -m32 unfortunately uses lock cmpxchg8b even when SSE2 is available: LLVM bug 33109. .

Some versions of gcc are configured so that -msse2 is on by default even with -m32 (in which case you could use -mno-sse2 or -march=i486 to see what gcc does without it).

I put load and store functions on the Godbolt compiler explorer to see asm from gcc with x87, SSE, and SSE2. And from clang4.0.1 and ICC18.

gcc bounces through memory as part of int->xmm or xmm->int, even when SSE4 (pinsrd / pextrd) is available. This is a missed-optimization (gcc bug 80833). In 64-bit mode it favours ALU movd + pinsrd / pextrd with -mtune=intel or -mtune=haswell, but apparently not in 32-bit mode or not for this use-case (64-bit integers in XMM instead of proper vectorization). Anyway, remember that only the load or store from atomic<long long> shared has to be atomic, the other loads/stores to the stack are private.



来源:https://stackoverflow.com/questions/48046591/how-do-i-atomically-move-a-64bit-value-in-x86-asm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!