问题
Let's say I'm implementing a bytecode compiler, similar to Lua's/Python's... so on.
I'm traversing the AST, generating bytecode instructions, and I run into a break
inside of an if-else
inside of a while
loop:
while (cond1)
if (cond2)
...
else
break
(I tried writing out equivalent bytecode but it didn't look too helpful.)
The point is, there are at least 4 "jump" instructions in that example, and I
want to find an elegant solution to filling in the jump addresses as I compile the AST...
I don't know the jump address for the while
loop or the break
until after I've fully
"compiled" the inner statements.
- A pseudocode solution would be ideal
- A solution shouldn't depend on whether I'm implementing a register- or stack-based bytecode compiler (I'm playing with both)
I'm not reading the dragon book just yet.
If I'm recursively compiling the AST, when I reach a break
statement inside of some arbitrary number of loops and if-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
回答1:
The principle you need is called "backpatching": fill in a dummy value for the forward jump, generate the code for the body of the statement, and then go back and replace the dummy value with the real one at the end.
e.g.
# start of codegen for 'while'
L1:
[evaluate cond1]
jne L2 # forward reference: use a dummy value for now
# start of codegen for 'if ... else'
L3:
[evaluate cond2]
jne L4 # another forward reference: use a dummy value for now
[code in 'if' branch]
j L5 # another forward reference: use a dummy value for now
L4:
[code in 'else' branch before 'break']
j L2
[code in 'else' branch after 'break']
L5: # end of 'if ... else'
# now go back and fill in references to L4, L5 inside the 'if ... else' block
# end of codegen for 'if ... else'
# codegen for 'while' continues...
j L1 # loop
L2: # end of 'while' loop
# now go back and fill in references to L2 inside the 'while' block
# end of codegen for 'while'
Regarding your edit:
If I'm recursively compiling the AST, when I reach a
break
statement inside of some arbitrary number of loops andif-else
blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.
The target of the jump which implements the break
statement is the end of the innermost enclosing loop; yes, you could track that with an explicit external stack.
But, if you have a recursive AST walking function, you already have an implicit stack - the call frames of the recursive function invocations - so you probably don't need an explicit one as well.
e.g. in
...
while (outer)
...
if (outer_done)
break
...
while (inner)
...
if (inner_done)
break
...
[break from inner 'while' ends up here]
...
if (outer_done_2)
break
...
[break from outer 'while' ends up here]
...
the entirety of the code generation for the inner while
loop happens within a recursive walk of the AST for the body of the outer while
loop. Inside this recursive call, you only need to care about the inner loop, not the outer one.
So, all you need to do is:
- save any current backpatching state at the start of processing a
while
- start a new empty list for tracking any
break
that appears within the body of thiswhile
- handle the body (which may result in a recursive call)
- apply the backpatches
- then restore the previous state before exiting.
e.g. something a bit like this:
codegen for while:
save previous break backpatch list
initialise break backpatch list as empty
perform codegen for evaluating condition
perform codegen for body statements
apply backpatches
restore previous break backpatch list
The current break
backpatch list must be part of some state which is passed to all the codegen functions (the codegen for break
needs to append to it). But the saved list could just be tracked as a local variable of the while
codegen function.
来源:https://stackoverflow.com/questions/13094001/intelligent-solution-to-computing-jump-addresses-in-a-bytecode-compiler