Using `GCCs` pre-processor as an assembler

前端 未结 2 1386
别跟我提以往
别跟我提以往 2020-12-10 08:35

There are various open source assemblers such as gas, nasm, and yasm. They have different pseudo-ops and macro syntaxes. For many open source pro

2条回答
  •  北海茫月
    2020-12-10 09:01

    I think that XY Problem is a wrong description. The question is more "Concept A is needed to evaluate Concept B".


    Concept A: What is an assembler?

    See: Assemblers and Loader, by David Solomon. [some pearls of wisdom, some archaic trivia]

    I very quickly discovered the lack of literature in this field. In strict contrast to compilers, for which a wide range of literature exists, very little has ever been written on assemblers and loaders.

    An assembler consists of,

    • A Symbol table to facilitates linking through some object format.
    • Lexer and Parser for converting the text to a data structure or directly to machine code.
    • Does 2 passes for most efficient branch and sub-routine calling.
    • An opcode table.

    An assembler is generally a 1-1 translation. However, often several variants of branches and calls will exist; generally known as long and short version. The opcode used will depend on the distance to the destination; a two pass compiler is needed to optimize forward branches.Alluded to by Harold


    Concept B: Using the 'C' pre-processor as an assembler.

    The best a 'C' pre-processor could emulate is a 1-pass assembler. A large class of CPU/instructions can be encoded like this; although the macros could be cumbersome. There would be no listings or xrefs, but most people would not miss those features. Also, the syntax would be odd due to limitation of the pre-processor. It would be difficult dealing with address fix-ups as labels would either re-use the 'C' symbol table by using pointers or a hand coded #define for the label offset. This limits this approach to anything but a basic block.

    Large assembler Routines

    Large assembler routines such as YUV/RGB transforms or MP3 decoding are highly unlikely to be used this way.

    Multi-arch code

    Multiple architecture code is quite common. For example an ARM wifi chip may have it's code embedded in a Linux kernel as firmware. It is possible that this technique could be useful here. However, using separate compilers/assembler for the different architectures and then using objcopy to embedded them is far more sane.

    Self-modifying Code

    This is probably the most useful. In fact many tools, such as linkers and loaders have high level functions which patch code at run time. It could also be used to conditionally change a routine at runtime; function pointers are almost as fast and easier to understand, not to mention the cache coherency issues.

    See also: Gold Blog, by Ian Lance Taylor. [although he uses ]

提交回复
热议问题