问题
I wrote a simple C++ function in order to check compiler optimization:
bool f1(bool a, bool b) {
    return !a || (a && b);
}
After that I checked the equivalent in Rust:
fn f1(a: bool, b: bool) -> bool {
    !a || (a && b)
}
I used godbolt to check the assembler output.
The result of the C++ code (compiled by clang with -O3 flag) is following:
f1(bool, bool):                                # @f1(bool, bool)
    xor     dil, 1
    or      dil, sil
    mov     eax, edi
    ret
And the result of Rust equivalent is much longer:
example::f1:
  push rbp
  mov rbp, rsp
  mov al, sil
  mov cl, dil
  mov dl, cl
  xor dl, -1
  test dl, 1
  mov byte ptr [rbp - 3], al
  mov byte ptr [rbp - 4], cl
  jne .LBB0_1
  jmp .LBB0_3
.LBB0_1:
  mov byte ptr [rbp - 2], 1
  jmp .LBB0_4
.LBB0_2:
  mov byte ptr [rbp - 2], 0
  jmp .LBB0_4
.LBB0_3:
  mov al, byte ptr [rbp - 4]
  test al, 1
  jne .LBB0_7
  jmp .LBB0_6
.LBB0_4:
  mov al, byte ptr [rbp - 2]
  and al, 1
  movzx eax, al
  pop rbp
  ret
.LBB0_5:
  mov byte ptr [rbp - 1], 1
  jmp .LBB0_8
.LBB0_6:
  mov byte ptr [rbp - 1], 0
  jmp .LBB0_8
.LBB0_7:
  mov al, byte ptr [rbp - 3]
  test al, 1
  jne .LBB0_5
  jmp .LBB0_6
.LBB0_8:
  test byte ptr [rbp - 1], 1
  jne .LBB0_1
  jmp .LBB0_2
I also tried with -O option but the output is empty (deleted unused function).
I intentionally am NOT using any library in order to keep output clean. Please notice that both clang and rustc use LLVM as a backend. What explains this huge output difference? And if it is only disabled-optimize-switch problem, how can I see optimized output from rustc?
回答1:
Compiling with the compiler flag -O (and with an added pub), I get this output (Link to Godbolt):
push    rbp
mov     rbp, rsp
xor     dil, 1
or      dil, sil
mov     eax, edi
pop     rbp
ret
A few things:
- Why is it still longer than the C++ version? - The Rust version is exactly three instructions longer: - push rbp mov rbp, rsp [...] pop rbp- These are instructions to manage the so called frame pointer or base pointer ( - rbp). This is mainly required to get nice stack traces. If you disable it for the C++ version via- -fno-omit-frame-pointer, you get the same result. Note that this uses- g++instead of- clang++since I haven't found a comparable option for the clang compiler.
- Why doesn't Rust omit frame pointer? - Actually, it does. But Godbolt adds an option to the compiler to preserve frame pointer. You can read more about why this is done here. If you compile your code locally with - rustc -O --crate-type=lib foo.rs --emit asm -C "llvm-args=-x86-asm-syntax=intel", you get this output:- f1: xor dil, 1 or dil, sil mov eax, edi ret- Which is exactly the output of your C++ version. - You can "undo" what Godbolt does by passing -C debuginfo=0 to the compiler. 
- Why - -Oinstead of- --release?- Godbolt uses - rustcdirectly instead of- cargo. The- --releaseflag is a flag for- cargo. To enable optimizations on- rustc, you need to pass- -Oor- -C opt-level=3(or any other level between 0 and 3).
回答2:
Compiling with -C opt-level=3 in godbolt gives:
example::f1:
  push rbp
  mov rbp, rsp
  xor dil, 1
  or dil, sil
  mov eax, edi
  pop rbp
  ret
Which looks comparable to the C++ version. See Lukas Kalbertodt's answer for more explanation.
Note: I had to make the function pub extern to stop the compiler optimising it to nothing, as it is unused.
回答3:
To get the same asm code, you need to disable debug info - this will remove the frame pointers pushes.
-C opt-level=3 -C debuginfo=0 (https://godbolt.org/g/vdhB2f)
回答4:
It doesn't (the actual difference is much smaller than shown in the question). I'm surprised nobody checked the C++ output:
godbolt C++ x64 clang 4.0, no compiler options
godbolt Rust 1.18, no compiler options
来源:https://stackoverflow.com/questions/45562164/why-does-this-code-generate-much-more-assembly-than-equivalent-c-clang