From Optimization Compiler on Wikipedia,
Compiler optimization is generally implemented using a sequence of optimizing transformations
Compile with switch -S
to get the assembly code. This should work for any level of optimization.
For instance, to get the assembly code generated in O2
mode, try:
g++/gcc -S -O2 input.cpp
a corresponding input.s
will be generated, which contains the assembly code generated. Repeat this for any optimization level you want.
gcc -O1 -S test.c (the capital O and capital S)
This site can also help you. You can use -O0
, -O1
, .. whatever suitable compiler options to get what you want.
Example from that site: (tested by both solutions)
void maxArray(double* x, double* y) {
for (int i = 0; i < 65536; i++) {
if (y[i] > x[i]) x[i] = y[i];
}
}
-O0
:result:
maxArray(double*, double*):
pushq %rbp
movq %rsp, %rbp
movq %rdi, -24(%rbp)
movq %rsi, -32(%rbp)
movl $0, -4(%rbp)
jmp .L2
.L5:
movl -4(%rbp), %eax
cltq
leaq 0(,%rax,8), %rdx
movq -32(%rbp), %rax
addq %rdx, %rax
movsd (%rax), %xmm0
movl -4(%rbp), %eax
cltq
leaq 0(,%rax,8), %rdx
movq -24(%rbp), %rax
addq %rdx, %rax
movsd (%rax), %xmm1
ucomisd %xmm1, %xmm0
jbe .L3
movl -4(%rbp), %eax
cltq
leaq 0(,%rax,8), %rdx
movq -24(%rbp), %rax
addq %rax, %rdx
movl -4(%rbp), %eax
cltq
leaq 0(,%rax,8), %rcx
movq -32(%rbp), %rax
addq %rcx, %rax
movq (%rax), %rax
movq %rax, (%rdx)
.L3:
addl $1, -4(%rbp)
.L2:
cmpl $65535, -4(%rbp)
jle .L5
popq %rbp
ret
-O1
:result:
maxArray(double*, double*):
movl $0, %eax
.L5:
movsd (%rsi,%rax), %xmm0
ucomisd (%rdi,%rax), %xmm0
jbe .L2
movsd %xmm0, (%rdi,%rax)
.L2:
addq $8, %rax
cmpq $524288, %rax
jne .L5
rep; ret
gcc/clang performs optimizations on the intermediate representations (IR), which can be printed after each optimization pass.
for gcc it is (-fdump-tree-all) 'http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html' with clang it is (-llvm -print-after-all).
Clang/gcc offers many more options to analyze the optimizations. It is easy to turn on/off an optimization from the command line (http://gcc.gnu.org/onlinedocs/gcc-3.4.4/gcc/Optimize-Options.html, http://llvm.org/docs/Passes.html)
with clang-llvm you can also list the optimization passes which were performed using the command line option (-mllvm -debug-pass=Structure)
If you would like to study compiler optimization and are agnostic to the compiler, then take a look at the Clang/LLVM projects. Clang is a C compiler that can output LLVM IR and LLVM commands can apply specific optimization passes individually.
Output LLVM IR:
clang test.c -S -emit-llvm -o test.ll
Perform optimization pass:
opt test.ll -<optimization_pass> -S -o test_opt.ll
Compile to assembly:
llc test.ll -o test.s
Intermediate representation can be saved to files using -fdump-tree-all
switch.
There are more fine-grained -fdump
switches awailable.
See gcc manual for details.
To be able to read these representations, take a look into GCC internals manual.
Whilst it is possible to take a small piece of code, compile it with -S
and with a variety of options, the difficulty is understanding what actually changed. It only takes a small change to make code quite different - one variable going into a register means that register is no longer available for something, causing knock-on effects to all of the remaining code in the function.
I was comparing the same code from two nearly identical functions earlier today (to do with a question on C++), and there was ONE difference in the source code. One change in which variable was used for the termination condition inside one for-loop led to over lines of assembler code changing. Because the compiler decided to arrange the registers in a different way, using a different register for one of the main variables, and then everything else changed as a consequence.
I've seen cases where adding a small change to a function moves it from being inlined to not being inlined, which in turn makes big changes to ALL of the code in the program that calls that code.
So, yes, by all means, compile very simple code with different optimisations, and use -S to inspect the compiler generated code. Then compare the different variants to see what effect it has. But unless you are used to reading assembler code, and understand what you are actually looking for, it can often be hard to see the forest for the trees.
It is also worth considering that optimisation steps often work in conjunction - one step allows another step to do its work (inline leads to branch merging, register usage, and so on).