Assembly - Swap function - Why it will not work?

会有一股神秘感。 提交于 2019-12-02 11:17:25
Alexey Frunze

You don't actually reserve memory on the stack that you use when you access a dword at [EBP-4]. It can get overwritten by things like interrupt routines, signal handlers, asynchronously called procedures, whatever applies in your OS.

The code should be like this instead:

swap:
    push  EBP
    mov   EBP,ESP           ; make a traditional stack frame

    sub   ESP, 4         ; reserve memory for a local variable at [EBP-4]

    mov   EBX, [EBP+12]        ; ebx = &x
    mov   EAX, DWORD [EBX]     ; eax = x
    mov   DWORD [EBP-4], EAX   ; [ebp-4] = eax = x
    mov   EDX, [EBP+8]         ; edx = &y
    mov   EAX, DWORD [EDX]     ; eax = y
    mov   DWORD [EBX], EAX     ; *&x = y
    mov   EAX, DWORD [EBP-4]   ; eax = x reloaded from the local
    mov   DWORD [EDX], EAX     ; *&y = x

    leave          ; remove locals (by restoring ESP), restore EBP

    ret

Also, make sure that you're passing as parameters the addresses of the variables x and y, not the values of the variables. push x+push y will pass the addresses of x and y in NASM but they will pass values of x and y in TASM and MASM.

Aside from Alexey's bugfix, you could make this significantly more efficient. (Of course inlining the swap and optimizing at the call site is even better.)

There's no need for a local temporary on the stack: you could instead reload one of the addresses twice, or save/restore ESI and use it as a temporary.

You're actually destroying EBX, which is call-preserved in all the normal C calling conventions. In most 32-bit x86 calling conventions, EAX, ECX, and EDX are the three call-clobbered registers you can use without saving/restoring, while the others are call-preserved. (So i.e. your caller expects you not to destroy their values, so you can only use them if you put back the original value. This is why EBP has to be restored after you use it for a frame pointer.)


What gcc -O3 -m32 does when compiling a stand-alone (not inlined) definition for a swap function is save/restore EBX so it has 4 registers to play with. clang chooses ESI.

void swap(int *px, int *py) {
    int tmp = *px;
    *px = *py;
    *py = tmp;
}

On the Godbolt compiler explorer:

# gcc8.2 -O3 -m32 -fverbose-asm
# gcc itself emitted the comments on the following instructions
swap:
        push    ebx     #
        mov     edx, DWORD PTR [esp+8]    # px, px
        mov     eax, DWORD PTR [esp+12]   # py, py
        mov     ecx, DWORD PTR [edx]      # tmp, *px_3(D)
        mov     ebx, DWORD PTR [eax]      # tmp91, *py_5(D)
        mov     DWORD PTR [edx], ebx      # *px_3(D), tmp91
        mov     DWORD PTR [eax], ecx      # *py_5(D), tmp
        pop     ebx       #
        ret  

# DWORD PTR is the gas .intel_syntax equivalent of NASM's DWORD
# you can just remove them all because the register implies an operand size

It also avoids making a legacy stack-frame. You can add -fno-omit-frame-pointer to the compiler options to see code-gen with a frame pointer, if you want. (Godbolt will recompile and show you the asm. Very handy site for exploring compiler options and code changes.)

64-bit calling conventions already have args in registers, and have enough scratch regs so we just get 4 instructions, much more efficient.


As I mentioned, another option is to reload one of the pointer args twice:

swap:
       # without a push, offsets relative to ESP are smaller by 4
        mov     edx, [esp+4]    # edx = px   reused later
        mov     eax, [esp+8]    # eax = py   also reused later
        mov     ecx, [edx]      # ecx = tmp = *px   lives for the whole function

        mov     eax, [eax]      # eax = *py   destroying our register copy of py
        mov    [edx], eax       # *px = *py;  done with px, can now destroy it

        mov     edx, [esp+8]   # edx = py
        mov    [edx], ecx       # *py = tmp;
        ret  

Only 7 instructions instead of 8. Loading the same value twice is very cheap, and out-of-order execution means it's not a problem to have the store address ready quickly even though in program order it's only the instruction right before the store that loads the address.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!