Intelligent solution to computing jump addresses in a bytecode compiler?

纵饮孤独 提交于 2019-12-08 17:41:48

问题


Let's say I'm implementing a bytecode compiler, similar to Lua's/Python's... so on.

I'm traversing the AST, generating bytecode instructions, and I run into a break inside of an if-else inside of a while loop:

while (cond1)
    if (cond2)
        ...
    else
        break

(I tried writing out equivalent bytecode but it didn't look too helpful.)

The point is, there are at least 4 "jump" instructions in that example, and I want to find an elegant solution to filling in the jump addresses as I compile the AST... I don't know the jump address for the while loop or the break until after I've fully "compiled" the inner statements.

  1. A pseudocode solution would be ideal
  2. A solution shouldn't depend on whether I'm implementing a register- or stack-based bytecode compiler (I'm playing with both)

I'm not reading the dragon book just yet.

If I'm recursively compiling the AST, when I reach a break statement inside of some arbitrary number of loops and if-else blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.


回答1:


The principle you need is called "backpatching": fill in a dummy value for the forward jump, generate the code for the body of the statement, and then go back and replace the dummy value with the real one at the end.

e.g.

# start of codegen for 'while'
L1:
  [evaluate cond1]
  jne L2   # forward reference: use a dummy value for now

# start of codegen for 'if ... else'
L3:
  [evaluate cond2]
  jne L4   # another forward reference: use a dummy value for now
  [code in 'if' branch]
  j L5     # another forward reference: use a dummy value for now
L4:
  [code in 'else' branch before 'break']
  j L2
  [code in 'else' branch after 'break']
L5:   # end of 'if ... else'
  # now go back and fill in references to L4, L5 inside the 'if ... else' block
  # end of codegen for 'if ... else'

  # codegen for 'while' continues...
  j L1   # loop
L2:   # end of 'while' loop
  # now go back and fill in references to L2 inside the 'while' block
  # end of codegen for 'while'

Regarding your edit:

If I'm recursively compiling the AST, when I reach a break statement inside of some arbitrary number of loops and if-else blocks, how should the compiler know which empty label to jump to? I'm guessing some type of label-name stack external to the recursive AST walking function.

The target of the jump which implements the break statement is the end of the innermost enclosing loop; yes, you could track that with an explicit external stack.

But, if you have a recursive AST walking function, you already have an implicit stack - the call frames of the recursive function invocations - so you probably don't need an explicit one as well.

e.g. in

...
while (outer)
    ...
    if (outer_done)
        break
    ...

    while (inner)
        ...
        if (inner_done)
            break
        ...
    [break from inner 'while' ends up here]

    ...
    if (outer_done_2)
        break
    ...
[break from outer 'while' ends up here]
...

the entirety of the code generation for the inner while loop happens within a recursive walk of the AST for the body of the outer while loop. Inside this recursive call, you only need to care about the inner loop, not the outer one.

So, all you need to do is:

  • save any current backpatching state at the start of processing a while
  • start a new empty list for tracking any break that appears within the body of this while
  • handle the body (which may result in a recursive call)
  • apply the backpatches
  • then restore the previous state before exiting.

e.g. something a bit like this:

codegen for while:
    save previous break backpatch list
    initialise break backpatch list as empty
    perform codegen for evaluating condition
    perform codegen for body statements
    apply backpatches
    restore previous break backpatch list

The current break backpatch list must be part of some state which is passed to all the codegen functions (the codegen for break needs to append to it). But the saved list could just be tracked as a local variable of the while codegen function.



来源:https://stackoverflow.com/questions/13094001/intelligent-solution-to-computing-jump-addresses-in-a-bytecode-compiler

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!