Why does GCC on x86-64 insert a NOP inside of a function?

我的未来我决定 提交于 2021-02-07 12:29:19

问题


Given the following C function:

void go(char *data) {
    char name[64];
    strcpy(name, data);
}

GCC 5 and 6 on x86-64 compile (plain gcc -c -g -o followed by objdump) this to:

0000000000000000 <go>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 50             sub    $0x50,%rsp
   8:   48 89 7d b8             mov    %rdi,-0x48(%rbp)
   c:   48 8b 55 b8             mov    -0x48(%rbp),%rdx
  10:   48 8d 45 c0             lea    -0x40(%rbp),%rax
  14:   48 89 d6                mov    %rdx,%rsi
  17:   48 89 c7                mov    %rax,%rdi
  1a:   e8 00 00 00 00          callq  1f <go+0x1f>
  1f:   90                      nop
  20:   c9                      leaveq 
  21:   c3                      retq   

Is there any reason for GCC to insert the 90/nop at 1f or is that just a side-effect that might happen when no optimizations are turned on?

Note: This question is different from most others because it asks about nop inside a function body, not an external padding.

Compiler versions tested: GCC Debian 5.3.1-14 (5.3.1) and Debian 6-20160313-1 (6.0.0)


回答1:


That's weird, I'd never noticed stray nops in the asm output at -O0 before. (Probably because I don't waste my time looking at un-optimized compiler output).

Usually nops inside functions are to align branch targets, including function entry points like in the question Brian linked. (Also see -falign-loops in the gcc docs, which is on by default at optimization levels other than -Os).


In this case, the nop is part of the compiler noise for a bare empty function:

void go(void) {
    //char name[64];
    //strcpy(name, data);
}
    push    rbp
    mov     rbp, rsp
    nop                     # only present for gcc5, not gcc 4.9.3
    pop     rbp
    ret

See that code in the Godbolt Compiler Explorer so you can check the asm for other compiler versions and compile options.

(Not technically noise, but -O0 enables -fno-omit-frame-pointer, and at -O0 even empty functions set up and tear down a stack frame.)


Of course, that nop is not present at any non-zero optimization level. There's no debugging or performance advantage to that nop in the code in the question. (See the performance guide links in the x86 tag wiki, esp. Agner Fog's microarchitecture guide to learn about what makes code fast on current CPUs.)

My guess is that it's purely an artifact of gcc internals. This nop is there as a nop in the gcc -S asm output, not as a .p2align directive. gcc itself doesn't count machine code bytes, it just uses alignment directives at certain points to align important branch targets. Only the assembler knows how big a nop is actually needed to reach the given alignment.

The default -O0 tells gcc that you want it to compile fast and not make good code. This means the asm output tells you more about gcc internals than other -O levels, and very little about how to optimize or anything else.

If you're trying to learn asm, it's more interesting to look at the code at -Og, for example (optimize for debugging).

If you're trying to see how well gcc or clang do at making code, you should look at -O3 -march=native (or -O2 -mtune=intel, or whatever settings you build your project with). Puzzling out the optimizations made at -O3 is a good way to learn some asm tricks, though. -fno-tree-vectorize is handy if you want to see a non-vectorized version of something fully optimized other than that.



来源:https://stackoverflow.com/questions/36646479/why-does-gcc-on-x86-64-insert-a-nop-inside-of-a-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!