Generated Assembly For Pointer Arithmetic

情到浓时终转凉″ 提交于 2021-02-05 08:10:08

问题


This is a simple question but I just came across it. In the code snippet below I create three pointers. I know the three will exhibit equivalent behavior (all point to the same thing), but I honestly thought the third action in the code was the most "efficient", meaning that it would generate less assembly instructions to accomplish the same thing as the other two.

I assumed that the first two have to first deference a pointer, and then take the memory address of the thing that was dereferenced, and then set some pointer equal to that memory address. The third I thought, just needed to increment a memory address by 1.

To my surprise, all three generate the same assembly instructions even with optimizations turned off: https://godbolt.org/z/Weefn4

Am I missing something obvious? Is there some compiler magic that simply recognizes these three as equivalent?

#include "stdio.h"
#include "stdint.h"

int main()
{
    unsigned int x[10];

    unsigned int* a = &x[1]; // Get address of dereferenced x[1]
    unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
    unsigned int* c = x+1; // Get address x+1

    printf("%x\n", a);
    printf("%x\n", b);
    printf("%x\n", c);

}

回答1:


Note that gcc -O0 really only disables optimization across statements, and disables only some within statements. See Disable all optimization options in GCC.

Within a single statement, it still does some of its usual optimizations within statements, including multiplicative inverses for division by non-power-of-2 constants.

Some other compilers do more braindead transliteration of C into asm with optimization disabled, e.g. MSVC will sometimes put a constant into a register and compare it against another constant, with two immediates. GCC never does anything that dumb; it evaluates constant expressions as far as possible and removes always-false branches.

If you want a very literal-minded compiler, a look at TinyCC, a one-pass compiler.


In this case: The ISO C standard defines all of those in terms of x+1

x[y] is syntactical sugar for *(x+y), so ISO C only has to define the rules for pointer math; the + operator between pointer and integral types. + is commutative (x+y and y+x are exactly equivalent), so it's not surprising that variations on that boil down to the same thing. In your case, T x[10] decays to a T* for the pointer math.

&*x "cancels out": the ISO C abstract machine never truly references the *x object, so this is safe even if x is a NULL pointer or pointing past the end of an array or whatever. That's why this takes the address of the array element, not of some temporary *x object. So this is the kind of thing compilers need to sort out before doing code-gen, not just evaluate *x with a mov load. Because then what? Having the value in a register doesn't help you take the address of the original location.


Nobody expects truly efficient code from -O0 (part of the goal is to compile fast, as well as consistent debugging), but gratuitous random extra instructions would be unwelcome even in cases where they're not dangerous.

GCC actually transforms source through GIMPLE and RTL internal representations of the program logic. It's probably during those passes where different C ways of expressing the same logic tend to become identical.

That said, it's somewhat surprising that gcc does lea rax, [rbp-80] / add rax, 4 instead of folding the + 1*sizeof(unsigned) into the LEA. It would of course do that if you used optimization. (and volatile unsigned int* to force it to still materialize the unused variables, if you want it to work without the code bloat of the printf calls.)


Other compilers:

MSVC does have some differences: https://godbolt.org/z/xoMfT4

;; x86-64 MSVC
        sub     rsp, 88                ; Windows x64 doesn't have a red zone
...
//   unsigned int* a = &x[1]; // Get address of dereferenced x[1]
        mov     eax, 4                          ; even dumber than GCC
        imul    rax, rax, 1                     ; sizeof(unsigned) * 1  I guess?
        lea     rax, QWORD PTR x$[rsp+rax]
        mov     QWORD PTR a$[rsp], rax
//   unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
        lea     rax, QWORD PTR x$[rsp+4]         ; smarter than GCC
        mov     QWORD PTR b$[rsp], rax
//   unsigned int* c = x+1; // Get address x+1
        lea     rax, QWORD PTR x$[rsp+4]
        mov     QWORD PTR c$[rsp], rax
...

c$[rsp] is just [16 + rsp], given the c$ = 16 assemble-time constant it defined earlier.

ICC and clang compile all versions the same way.

MSVC for AArch64 avoids the multiply (and uses hex literals instead of decimal). But like x86-64 GCC, it gets the array base address into a register and then adds 4. https://godbolt.org/z/ThPxx9

@@ AArch64 MSVC
...
        sub         sp,sp,#0x40
...
//   unsigned int* a = &x[1]; // Get address of dereferenced x[1]
        add         x8,sp,#0x20
        add         x8,x8,#4
        str         x8,[sp]
//    unsigned int* b = &(*(x+1)); // Get address of dereferenced *(x+1)
        add         x8,sp,#0x20
        add         x8,x8,#4
        str         x8,[sp,#8]
//    unsigned int* c = x+1; // Get address x+1
        add         x8,sp,#0x20
        add         x8,x8,#4
        str         x8,[sp,#0x10]
//    unsigned int* d = &1[x];
        add         x8,sp,#0x20
        add         x8,x8,#4
        str         x8,[sp,#0x18]

Clang uses the interesting strategy of getting the array base address into a register once, and adding to it for each statement. I guess it considers that x86-64 lea or AArch64 add x9, sp, #36 part of its prologue, if it wants to support debuggers that use jump between source lines, and maybe won't do if it there's any non-linear control-flow in the function?




回答2:


Those three are all defined to be equivalent by the Standard:

  • It explicitly has a statement that &*(X) is exactly identical to (X) in all cases
  • A[B] is defined as *(A+B).

Combining the second rule with the first one, we get &(A[B]) being identical to (A+B).


In general, you will notice a bunch of other "optimizations" occur as well.

C is defined in terms of the output of an abstract machine. All programs which produce the same output are equivalent programs in the eyes of the standard.

The different optimization levels offered by a compiler cater to debuggability and compilation size/speed considerations , they aren't some intrinsic levels of the language or anything.



来源:https://stackoverflow.com/questions/65713833/generated-assembly-for-pointer-arithmetic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!