Reusing symbol table from semantic analysis phase for code generation

问题

I'm currently building a compiler for a language which has global variable and nested subroutine feature. Previously, I've only ever built a compiler for languages which only has local variable without nested subroutine.

I have a problem on how to reuse symbol table filled during semantic analysis phase in code generation phase. I make the symbol table as a stack of linked list, where each linked list represents identifiers declared in a particular scope. Every time it enters a scope, a new list is created and pushed to the stack and it becomes current scope. Likewise, every time it leaves a scope, the list on top of stack is popped. In the end, after the semantic analysis finishes, I practically have empty symbol table, just like when it starts. However, the code generator needs a completely filled symbol table to correctly generate code. How can this be done without re-doing what has been done during semantic analysis (i.e. entering identifiers to the symbol table)?

回答1:

This is going to be a bit abstract - as your question - since I don't know anything concrete about your compiler's internal data structures.

When you pop your scope, instead of deleting it, as I assume you do now, assign the pointer to the scope data to a member of the data that you base code generation on for that scope, so that the code generator can get to it.

回答2:

You have to decide how much context your compiler is going to retain to support optimization and code generation.

You can build a pure-on-the-fly code generator that throws away symbol table information on leaving a scope, if it has generated all the code (or the IR) that it is going to generate for that scope. This can work if you are building a quick and dirty compiler, and it is useful when your computer doesn't have a lot of memory. (On modern PCs, you cannot make the latter argument).

If you don't do any code analysis/optimization/IR or code generation until you reach the end of the parsing process, then you'll have to hang onto the symbol-tables-per-scope information longer. You'll discover in this case that you'll have to hang onto the ASTs, too, or you'll have nothing to generate code from. (On modern PCs, this is not an issue).

To build a compiler with a simple architecture, you probably want to isolate parsing, semantic analysis, and code generation passes anyway. In this case, your parser runs and just builds an AST; don't bother building a symbol table. Pass two walks the tree, and builds symbol tables that correspond to parts of the AST, and keeps that relationship; now you have ASTs and associated symbol tables. Pass 3 can now walk the ASTs and use the symbol information to generate and IR. Pass 4 optimizes the IR; it may still reference symbol table entries decorated with type information and possible storage location assignments. After that, you can do optimizations and final code generation.

The main point of all this is, don't throw the symbol tables away. Save them and associate them with the code structures you need for code generation. You have lots of memory to save them in.

来源：https://stackoverflow.com/questions/35185532/reusing-symbol-table-from-semantic-analysis-phase-for-code-generation

标签

compiler-construction

code-generation

code-reuse

semantic-analysis

symbol-table