Correct way to wrap CMPXCHG8B in GCC inline assembly, 32 bits

家住魔仙堡 提交于 2019-12-05 03:49:26

How about the following, which seems to work for me in a small test:

int sbcas(uint64_t* ptr, uint64_t oldval, uint64_t newval)
{
    int changed = 0;
    __asm__ (
        "push %%ebx\n\t" // -fPIC uses ebx, so save it
        "mov %5, %%ebx\n\t" // load ebx with needed value
        "lock\n\t"
        "cmpxchg8b %0\n\t" // perform CAS operation
        "setz %%al\n\t" // eax potentially modified anyway
        "movzx %%al, %1\n\t" // store result of comparison in 'changed'
        "pop %%ebx\n\t" // restore ebx
        : "+m" (*ptr), "=r" (changed)
        : "d" ((uint32_t)(oldval >> 32)), "a" ((uint32_t)(oldval & 0xffffffff)), "c" ((uint32_t)(newval >> 32)), "r" ((uint32_t)(newval & 0xffffffff))
        : "flags", "memory"
        );
    return changed;
}

If this also gets miscompiled could you please include a small snippet that triggers this behavior?

Regarding the bonus question I don't think it is possible to branch after the assembler block using the condition code from the cmpxchg8b instruction (unless you use the asm goto or similar functionality). From GNU C Language Extensions:

It is a natural idea to look for a way to give access to the condition code left by the assembler instruction. However, when we attempted to implement this, we found no way to make it work reliably. The problem is that output operands might need reloading, which would result in additional following "store" instructions. On most machines, these instructions would alter the condition code before there was time to test it. This problem doesn't arise for ordinary "test" and "compare" instructions because they don't have any output operands.

EDIT: I Can't find any source that specifies one way or the other whether it is OK to modify the stack while also using the %N input values (This ancient link says "You can even push your registers onto the stack, use them, and put them back." but the example doesn't have input).

But it should be possible to do without by fixing the values to other registers:

int sbcas(uint64_t* ptr, uint64_t oldval, uint64_t newval)
{
    int changed = 0;
    __asm__ (
        "push %%ebx\n\t" // -fPIC uses ebx
        "mov %%edi, %%ebx\n\t" // load ebx with needed value
        "lock\n\t"
        "cmpxchg8b (%%esi)\n\t"
        "setz %%al\n\t" // eax potentially modified anyway
        "movzx %%al, %1\n\t"
        "pop %%ebx\n\t"
        : "+S" (ptr), "=a" (changed)
        : "0" (ptr), "d" ((uint32_t)(oldval >> 32)), "a" ((uint32_t)(oldval & 0xffffffff)), "c" ((uint32_t)(newval >> 32)), "D" ((uint32_t)(newval & 0xffffffff))
        : "flags", "memory"
        );
    return changed;
}

This is what I have:

bool
spin_lock(int64_t* lock, int64_t thread_id, int tries)
{
    register int32_t pic_hack asm("ebx") = thread_id & 0xffffffff;
retry:
    if (tries-- > 0) {
        asm goto ("lock cmpxchg8b %0; jnz %l[retry]"
                  :
                  : "m" (*lock), "A" ((int64_t) 0),
                    "c" ((int32_t) (thread_id >> 32)), "r" (pic_hack)
                  :
                  : retry);
        return true;
    }
    return false;
}

It uses the asm goto feature, new with gcc 4.5, that allows jumps from inline assembly into C labels. (Oh, I see your comment about having to support old versions of gcc. Oh well. I tried. :-P)

Amazingly enough, the code fragment in the question still gets miscompiled in some circumstances: if the zero-th asm operand is indirectly addressable through EBX (PIC) before the EBX register is set up with register asm, then gcc proceeds to load the operand through EBX after it's assigned to set & 0xFFFFFFFF!

This is the code I am trying to make work now: (EDIT: avoid push/pop)

asm ("movl %%edi, -4(%%esp);"
     "leal %0, %%edi;" 
     "xchgl %%ebx, %%esi;"
     "lock; cmpxchg8b (%%edi);" // Sets ZF
     "movl %%esi, %%ebx;"       // Preserves ZF
     "movl -4(%%esp), %%edi;"   // Preserves ZF
     "setz %1;"                 // Reads ZF
     : "+m" (*a), "=q" (ret), "+A" (*cmp)
     : "S" ((int32)(set & 0xFFFFFFFF)), "c" ((int32)(set >> 32))
     : "flags")

The idea here is to load the operands before clobbering the EBX, also avoid any indirect addressing while setting EBX value for CMPXCHG8B. I fix the hard register ESI for the lower half of operand, because if I didn't, GCC would feel free to reuse any other already taken register if it could prove that the value was equal. The EDI register is saved manually, as simply adding it to the clobbered register list chokes GCC with "impossible reloads", probably due to high register pressure. The PUSH/POP is avoided in saving EDI, as other operands might be ESP-addressed.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!