Do any languages / compilers utilize the x86 ENTER instruction with a nonzero nesting level?

后端 未结 4 1768
你的背包
你的背包 2020-12-13 08:20

Those familiar with x86 assembly programming are very used to the typical function prologue / epilogue:

push ebp ; Save old frame pointer.
mov  ebp, esp ; Po         


        
相关标签:
4条回答
  • 2020-12-13 09:04

    Our PARLANSE compiler (for fine-grain parallel programs on SMP x86) has lexical scoping.

    PARLANSE tries to generate many, many small parallel grains of computation, and then multiplexes them on top of threads (1 per CPU). In fact, the stack frames are heap allocated; we didn't want to pay the price of a "big stack" for each grain since we have many, and we didn't want to put a limit on how deep anything could recurse. Because of parallel forks, the stack is actually a cactus stack.

    Each procedure, on entry, builds a lexical display to enable access to surrounding lexical scopes. We considered using the ENTER instruction, but decided against it for two reasons:

    • As others have noted, it isn't particularly fast. MOV instructions do just as well.
    • We observed that the display is often sparse, and tends to be denser on the lexically deeper side. Most internal helper functions do fine with access only to their direct lexical parent; you don't always need access to all of your parents. Sometimes none.

    Consequently, the compiler figures out exactly which lexical scopes a function needs access to, and generates, in the function prolog where ENTER would go, just the MOV instructions to copy the part of the parent's display that is actually needed. That often turns out to be 1 or 2 pairs of moves.

    So we win twice on performance over using ENTER.

    IMHO, ENTER is now one of those legacy CISC instructions, which seemed like a good idea at the time it was defined, but get outperformed by RISC instruction sequences that even Intel x86 optimizes.

    0 讨论(0)
  • 2020-12-13 09:11

    I did some instruction counting statistics on Linux boots using the Simics virtual platform, and found that ENTER was never used. However,there were quite a few LEAVE instructions in the mix. There was almost a 1-1 correlation between CALL and LEAVE. That would seem to corroborate the idea that ENTER is just slow and expensive, while LEAVE is pretty handy. This was measured on a 2.6-series kernel.

    The same experiments on a 4.4-series and a 3.14-series kernel showed zero use of either LEAVE or ENTER. Presumably, the gcc code generation for the newer gccs used to compile these kernels has stopped emitting LEAVE (or the machine options are set differently).

    0 讨论(0)
  • 2020-12-13 09:16

    As Iwillnotexist Idonotexist pointed out, GCC does support nested functions in C, using the exact syntax I've shown above.

    However, it does not use ENTER instruction. Instead, variables which are used in nested functions are grouped together in the local variables area, and a pointer to this group is passed to the nested function. Interestingly, this "pointer to parent variables" is passed via a nonstandard mechanism: On x64 it is passed in r10, and on x86 (cdecl) it is passed in ecx, which is reserved for the this pointer in C++ (which doesn't support nested functions anyway).

    #include <stdio.h>
    void func_a(void)
    {
        int a1 = 0x1001;
        int a2=2, a3=3, a4=4;
        int a5 = 0x1005;
    
        void func_b(int p1, int p2)
        {
            /* Use variables from func_a() */
            printf("a1=%d a5=%d\n", a1, a5);
        }
        func_b(1, 2);
    }
    
    int main(void)
    {
        func_a();
        return 0;
    }
    

    Produces the following (snippet of) code when compiled for 64-bit:

    00000000004004dc <func_b.2172>:
      4004dc:   push   rbp
      4004dd:   mov    rbp,rsp
      4004e0:   sub    rsp,0x10
      4004e4:   mov    DWORD PTR [rbp-0x4],edi
      4004e7:   mov    DWORD PTR [rbp-0x8],esi
      4004ea:   mov    rax,r10                    ; ptr to calling function "shared" vars
      4004ed:   mov    ecx,DWORD PTR [rax+0x4]
      4004f0:   mov    eax,DWORD PTR [rax]
      4004f2:   mov    edx,eax
      4004f4:   mov    esi,ecx
      4004f6:   mov    edi,0x400610
      4004fb:   mov    eax,0x0
      400500:   call   4003b0 <printf@plt>
      400505:   leave  
      400506:   ret    
    
    0000000000400507 <func_a>:
      400507:   push   rbp
      400508:   mov    rbp,rsp
      40050b:   sub    rsp,0x20
      40050f:   mov    DWORD PTR [rbp-0x1c],0x1001
      400516:   mov    DWORD PTR [rbp-0x4],0x2
      40051d:   mov    DWORD PTR [rbp-0x8],0x3
      400524:   mov    DWORD PTR [rbp-0xc],0x4
      40052b:   mov    DWORD PTR [rbp-0x20],0x1005
      400532:   lea    rax,[rbp-0x20]              ; Pass a, b to the nested function
      400536:   mov    r10,rax                     ; in r10 !
      400539:   mov    esi,0x2
      40053e:   mov    edi,0x1
      400543:   call   4004dc <func_b.2172>
      400548:   leave  
      400549:   ret  
    

    Output from objdump --no-show-raw-insn -d -Mintel

    This would be equivalent to something more verbose like this:

    struct func_a_ctx
    {
        int a1, a5;
    };
    
    void func_b(struct func_a_ctx *ctx, int p1, int p2)
    {
        /* Use variables from func_a() */
        printf("a1=%d a5=%d\n", ctx->a1, ctx->a5);
    }
    
    void func_a(void)
    {
        int a2=2, a3=3, a4=4;
        struct func_a_ctx ctx = {
            .a1 = 0x1001,
            .a5 = 0x1005,
        };
    
        func_b(&ctx, 1, 2);
    }
    
    0 讨论(0)
  • 2020-12-13 09:20

    enter is avoided in practice as it performs quite poorly - see the answers at "enter" vs "push ebp; mov ebp, esp; sub esp, imm" and "leave" vs "mov esp, ebp; pop ebp". There are a bunch of x86 instructions that are obsolete but are still supported for backwards compatibility reasons - enter is one of those. (leave is OK though, and compilers are happy to emit it.)

    Implementing nested functions in full generality as in Python is actually a considerably more interesting problem than simply selecting a few frame management instructions - search for 'closure conversion' and 'upwards/downwards funarg problem' and you'll find many interesting discussions.

    Note that the x86 was originally designed as a Pascal machine, which is why there are instructions to support nested functions (enter, leave), the pascal calling convention in which the callee pops a known number of arguments from the stack (ret K), bounds checking (bound), and so on. Many of these operations are now obsolete.

    0 讨论(0)
提交回复
热议问题