When to use an abstract or concrete syntax tree?

会有一股神秘感。 提交于 2019-12-05 09:39:27
Ira Baxter

An AST that models all the semantic details of the source language is all you need. By definition, if it does model the semantics correctly, and your langauge includes a ternary operator, then it will model the specific order in which the operators are applied (e.g, the results of the predecence modulo overrides such as parentheses) correctly too.

So your problem isn't in the AST. It is generating to another language using similar (ternary) operators whose precedence is different.

This is an age-old problem in code generation: the operators of the target don't quite match the operators of the source, and so the output can't be one-to-one. In your case, you should be able to solve the problem by generating PHP ternary operators with parentheses around them to control the order to achieve what the original semantics are, so this isn't a big problem.

In general, generating sequences of code that achieve a desired result can be pretty complicated, and there's lots of ways to do it. That's why the compiler books are thick rather than thin. You seem to have implicitly settled on "get AST, walk AST, spit code"; this is almost an on-the-fly code generator. And this works adequately if you don't care if the generated code is particularly good, and the target language is pretty close to the source language.

If the code generation problem is more complex, what normally happens is the AST is used to generate what amounts to a data flow model of the computation, consisting of operators that produce results, and consume results from previous operators, grounding out in "operators" that fetch variable values and constants. Then the data flow representation is traversed to generate the code; this has the advantage that you can choose an operator in the data flow representation, find a matching code sequence in the target language, generate that, and then worry about how the operands are collected. Better schemes match data flow subgraphs (representing equivalent compound target language constructs) to the produced data flow graph; this can produce significantly better code. Often, one can apply target-language specific optimizations after raw code generation to produce even better code. In both cases, you have to worry about managing the operator results; can they be fed directly to the next target language operator, or must they go into some kind of temporary storage (for machine code, this can be another register or a memory location). Doing all this isn't easy; again, that's why the compiler books aren't thin.

A variation on this idea is source-to-source program transformations. This maps constructs in source code "directly" to constructs in target code, although this is usually done behind the scenes by operating on ASTs because unparsed programming language text is hard to match. Our DMS Software Reengineering Toolkit is an example of this kind of system. With such a tool, you write patterns in the source language (which implicitly match against a parse tree), and corresponding patterns in the target langauge (implicitly producing target language ASTs). You can write complex source or target constructs giving much of the effect of the data flow graph matching above. Post-generation optimization consists of more rewrite rules that transform the target code to target code.

Bottom line: Having an AST isn't really enough unless your translation is truly trivial. You can read more about what you do need at this SO answer: https://stackoverflow.com/a/3460977/120163

WARNING: Loud opinion follows.

Re "Transcoder": I much prefer the term "compilation", "translation", or "source-to-source" compiler. I've been building program analysis and manipulation tools for almost 40 years. I'd never heard the term "transcoder" until I encountered this SO question: Experience migrating legacy Cobol/PL1 to Java and a response describing IMHO a truly awful code translation scheme called NACA. I've heard since that this term is gaining traction; I don't see why we had to invent another term when we have perfectly adequate ones. Usually this is a sign of somebody inventing a high priesthood; "let's invent a shiny new term so people don't really understand what we're doing". I'm happy to leave that term to such truly awful translations.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!