There are various open source assemblers such as gas, nasm, and yasm. They have different pseudo-ops
and macro
syntaxes. For many open source pro
I think that XY Problem is a wrong description. The question is more "Concept A is needed to evaluate Concept B".
Concept A: What is an assembler?
See: Assemblers and Loader, by David Solomon. [some pearls of wisdom, some archaic trivia]
I very quickly discovered the lack of literature in this field. In strict contrast to compilers, for which a wide range of literature exists, very little has ever been written on assemblers and loaders.
An assembler consists of,
An assembler is generally a 1-1
translation. However, often several variants of branches and calls will exist; generally known as long and short version. The opcode used will depend on the distance to the destination; a two pass compiler is needed to optimize forward branches.Alluded to by Harold
Concept B: Using the 'C' pre-processor as an assembler.
The best a 'C' pre-processor could emulate is a 1-pass assembler. A large class of CPU/instructions can be encoded like this; although the macros could be cumbersome. There would be no listings or xrefs, but most people would not miss those features. Also, the syntax would be odd due to limitation of the pre-processor. It would be difficult dealing with address fix-ups as labels would either re-use the 'C' symbol table by using pointers or a hand coded #define
for the label offset. This limits this approach to anything but a basic block.
Large assembler routines such as YUV/RGB transforms or MP3 decoding are highly unlikely to be used this way.
Multiple architecture code is quite common. For example an ARM wifi chip may have it's code embedded in a Linux kernel as firmware. It is possible that this technique could be useful here. However, using separate compilers/assembler for the different architectures and then using objcopy
to embedded them is far more sane.
This is probably the most useful. In fact many tools, such as linkers and loaders have high level functions which patch code at run time. It could also be used to conditionally change a routine at runtime; function pointers are almost as fast and easier to understand, not to mention the cache coherency issues.
See also: Gold Blog, by Ian Lance Taylor. [although he uses
]