I\'m primarily interested in popular and widely used compilers, such as gcc. But if things are done differently with different compilers, I\'d like to know that, too.
Visual C++ has a switch to output assembly code, so I think it generates assembly code before outputting machine code.
You'd probably be interested to listen to this pod cast: Internals of GCC
gcc actually produces assembler and assembles it using the as assembler. Not all compilers do this - the MS compilers produce object code directly, though you can make them generate assembler output. Translating assembler to object code is a pretty simple process, at least compared with compilation.
Some compilers produce other high-level language code as their output - for example, cfront, the first C++ compiler produced C as its output which was then compiled by a C compiler.
Note that neither direct compilation or assembly actually produce an executable. That is done by the linker, which takes the various object code files produced by compilation/assembly, resolves all the names they contain and produces the final executable binary.
None of the answers clarifies the fact that an ASSEMBLER is the first layer of abstraction between BINARY CODE and MACHINE DEPENDENT SYMBOLIC CODE. A compiler is the second layer of abstraction between MACHINE DEPENDENT SYMBOLIC CODE and MACHINE INDEPENDENT SYMBOLIC CODE.
If a compiler directly converts code to binary code, by definition, it will be called assembler and not a compiler.
It is more appropriate to say that a compiler uses INTERMEDIATE CODE which may or may not be assembly language e.g. Java uses byte code as intermediate code and byte code is assembler for java virtual machine (JVM).
EDIT: You may wonder why an assembler always produces machine dependent code and why a compiler is capable of producing machine independent code. The answer is very simple. An assembler is direct mapping of machine code and therefore assembly language it produces is always machine dependent. On the contrary, we can write more than one versions of a compiler for different machines. So to run our code independently of machine, we must compile same code but on the compiler version written for that machine.
Almost all compilers, including gcc, produce assembly code because it's easier---both to produce and to debug the compiler. The major exceptions are usually just-in-time compilers or interactive compilers, whose authors don't want the performance overhead or the hassle of forking a whole process to run the assembler. Some interesting examples include
Standard ML of New Jersey, which runs interactively and compiles every expression on the fly.
The tinycc compiler, which is designed to be fast enough to compile, load, and run a C script in well under 100 milliseconds, and therefore doesn't want the overhead of calling the assembler and linker.
What these cases have in common is a desire for "instantaneous" response. Assemblers and linkers are plenty fast, but not quite good enough for interactive response. Yet.
There are also a large family of languages, such as Smalltalk, Java, and Lua, which compile to bytecode, not assembly code, but whose implementations may later translate that bytecode directly to machine code without benefit of an assembler.
(Footnote: in the early 1990s, Mary Fernandez and I wrote the New Jersey Machine Code Toolkit, for which the code is online, which generates C libraries that compiler writers can use to bypass the standard assembler and linker. Mary used it to roughly double the speed of her optimizing linker when generating a.out
. If you don't write to disk, speedups are even greater...)
GCC compiles to assembler. Some other compilers don't. For example, LLVM-GCC compiles to LLVM-assembly or LLVM-bytecode, which is then compiled to machine code. Almost all compilers have some sort of internal representation, LLVM-GCC use LLVM, and, IIRC, GCC uses something called GIMPLE.