reason why custom loop is faster? bad compiler? unsafe custom code? luck?(lucky cache hits)

后端 未结 5 1051
失恋的感觉
失恋的感觉 2021-01-16 15:08

i just started learning assembly and making some custom loop for swapping two variables using C++ \'s asm{} body with Digital-Mars compiler in C-Free 5.0

Enabled th

5条回答
  •  深忆病人
    2021-01-16 15:39

    The code generated by that compiler is pretty horrible. After disassembling the object file with objconv, here's what I got in regards to the first for loop.

    ?_001:  cmp     dword [ebp-4H], 200000000               ; 0053 _ 81. 7D, FC, 0BEBC200
            jge     ?_002                                   ; 005A _ 7D, 17
            inc     dword [ebp-4H]                          ; 005C _ FF. 45, FC
            mov     eax, dword [ebp-18H]                    ; 005F _ 8B. 45, E8
            mov     dword [ebp-10H], eax                    ; 0062 _ 89. 45, F0
            mov     eax, dword [ebp-14H]                    ; 0065 _ 8B. 45, EC
            mov     dword [ebp-18H], eax                    ; 0068 _ 89. 45, E8
            mov     eax, dword [ebp-10H]                    ; 006B _ 8B. 45, F0
            mov     dword [ebp-14H], eax                    ; 006E _ 89. 45, EC
            jmp     ?_001                                   ; 0071 _ EB, E0
    

    The issues should be clear to anybody who ever looked at some assembly.

    1. The loop is very tightly dependent on the value that is put in eax. This makes any out-of-order execution practically impossible due to dependencies created on that register by every next instruction.

    2. There are six general-purpose registers available (since ebp and esp aren't really general-purpose in most of the setups), but your compiler uses none of them, falling back to using the local stack. This is absolutely unacceptable when speed is the optimization goal. We can even see that the current loop index is stored at [ebp-4H], while it could've been easily stored in a register.

    3. The cmp instruction uses a memory and an immediate operand. This is the slowest possible mix of operands and should never be used when performance is at stake.

    4. And don't get me started on the code size. Half of those instructions are just unnecessary.

    All in all, the first thing I'd do is ditch that compiler at the earliest possible chance. But then again, seeing that it offers "memory models" as one of its options, one can't really seem to have much hope.

提交回复
热议问题