Compiling a C program is a multi-stage process which is separated into four stages:
1. Preprocessing
2. Compilation
3. Assembly
4. Linking
In this spot, I'll try to walk through the four stages of compiling with the following C program.
/* * "Hello, World!": A classic. */ #include <stdio.h> int main(void) { puts("Hello, World!"); return 0; }
1. Preprocessing
Q: What does the compiler do?
A: Interpreting the lines starting with a # character.
In this stage, lines staring with a '#' symbol are interpreted by the preprocessor as preprocessor commands which form a simple macro language with its own syntax and semantics. This
There are more details below:
1. Replacing the macro definitions #define(character replacement).
2. Interpreting the conditional compilation instruction(#if, #elif, #else etc.).
3. Interpreting the #include and join these header files.
4. Stripping comments.
5. Joining continued lines(lines ending with a \).
6. Adding order numbers and file identifications.(Insure the compiler can show you exact errors.)
7. Holding #pragma which is useful for compiler.
Q: What we get now?
[lines omitted for brevity] extern int __vsnprintf_chk (char * restrict, size_t, int, size_t, const char * restrict, va_list); # 493 "/usr/include/stdio.h" 2 3 4 # 2 "hello_world.c" 2 int main(void) { puts("Hello, World!"); return 0; }
Q: What happened in this step?
A: Translating the High-Level Language(C code) into Assembly Instructions.
Now we come to the second step: compilation. In this stage, the preprocessed code is translated to a kind of human-readable language called assembly instructions.(Some compilers also support to generate Machine Code directly, avoiding generating the intermediate assembly instructions and invoking the assembler.)
Q: What happened in our code?
A: Generating some more abstract code(assembly instructions) and a hello_world.o file.
.section __TEXT,__text,regular,pure_instructions .macosx_version_min 10, 10 .globl _main .align 4, 0x90 _main: ## @main .cfi_startproc ## BB#0: pushq %rbp Ltmp0: .cfi_def_cfa_offset 16 Ltmp1: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp2: .cfi_def_cfa_register %rbp subq $16, %rsp leaq L_.str(%rip), %rdi movl $0, -4(%rbp) callq _puts xorl %ecx, %ecx movl %eax, -8(%rbp) ## 4-byte Spill movl %ecx, %eax addq $16, %rsp popq %rbp retq .cfi_endproc .section __TEXT,__cstring,cstring_literals L_.str: ## @.str .asciz "Hello, World!" .subsections_via_symbols
Q: What is the most underlying language?
A: Machine Code! Binary File!
We've walked through two steps mentioned above, now we encounter the last translating step - assembly. During this stage, assembly instructions are used to be translated into machine code, or object code. The output consists of actual instructions to be run by the target processor and now we get a binary-formatting file called hello_world.o.
Q: What was compiler doing during the steps Compilation and Assembly?
A: Four main works.
a. Lexical analysis
b. Grammatical analysis
c. Semantic analysis
d. Optimizing and generating machine code
Q: What is a linker and how it works?
A: Some paths to outer files which contains codes required.
----------------------------------------------------------------------------------
Reference(maybe Google used):
https://www.hackerearth.com/zh/practice/notes/what-happens-when-a-c-program-runs/
https://www.calleerlandsson.com/the-four-stages-of-compiling-a-c-program/
https://www.cnblogs.com/wuyouxiaocai/p/5701088.html
Contact me if you have any problems:
jeffluo1999@outlook.com
Thanks for your watching : )