LL(1) parser implemented with stack: how to build AST?

纵饮孤独 提交于 2021-01-02 06:08:25

问题


I am currently building a parser by hand. It is a LL(1) parser. At the moment, it is a great recognizer: its function parse(List tokens) decides whether or not tokens is a member of the language or not.

Now, I want to build the corresponding AST for that input. However, I know how to implement it in a recursive descent way (already did it). That is, for the challenge, I implement my stack using a stack with the classical algorithm:

next <- first token of the input
stack <- START_SYMBOL
do {
    top <- stack.pop()
    if (top is a terminal and top == next) {
        next <- next token of the input
    } else if (top is a non terminal and PARSING_TABLE[top, next] exists) {
        stack.push(PARSING_TABLE[top, next]);
    } else {
         return invalid input;
    }
} while (stack is not empty);
return valid input;

where the PARSING_TABLE is the LL(1) table. However, I wonder how to implement the part which build the AST in such a configuration. I do not ask for complete implementation, more for implementation idea.

Thanks !


回答1:


Your stack can be annotated so that it contains the AST entry reference (i.e. rule number + position in rule + target data where to store) + (terminal/non terminal)

Your initial stack <- START_SYMBOL is annotated to store its result in the AST root.

Basically, your pop() selects the current AST construct. Then the next <- next token saves the value in your AST. The stack.push(PARSING_TABLE[top, next]); opens a new AST list and writes it in the construct corresponding to top, and generates in each entry of the stack the 'rule number + position + target list'.

When you parsing is finished, you have the entire tree.

In a precise AST, you might want to ignore some tokens. This can be done via appropriate annotations in the stack set during the push() part. The typical way is to attach to each of your rules (A -> B C) some meta information, for example, what is to be kept and what is the nature of the result.




回答2:


The difficulty arises because the common method of replacing a nonterminal on the stack with the rhs of its matched-rule effectively forgets the grammatical structure at the moment it's predicted. But to generate an AST you need that structure later when a rule-parse is completed.

Rather than replacing a nonterminal with the rhs symbols of its matching rule, leave it in place and push the matched symbols as a list object. This way the stack retains the hierarchial structure of the grammar.

Parsing consumes symbols in the topmost list. The exhaustion of a list corresponds to the completion of a rule. A nonterminal is removed from the stack when its rule is completed, not when it is predicted.

As the stack is consumed, build a corollary AST structure that remembers the relevant rule and stores the parsed tokens. Thus the stack acts like a predicted AST that flows into the parsed AST.

You can think of this as emulating the call hierarchy of a recursive-descent parser with the stack of symbol-lists as a stack of call-frames.

In ruby:

# g is the grammar; a list of rules
# s is a terminal sequence to parse
# note, this code does not tokenize EOF

def parse(g, s)

 tab = gen_table(g)
 stack = [[g.start_sym]]
 # intermediate ast node format: [rule-action, symbols...]
 ast = [[->(rhs){[:_S, rhs[0]]}]]

 loop do
  puts "PARSE\n #{s}\n #{stack}\n #{ast}"

  if stack.first.empty?
   raise "extraneous input" if not s.empty?
   break
  end

  if stack.last.empty? # rule complete
   stack.pop
   node = ast.pop
   # transform the node (eg to a class) using the associated rule-action
   node = node.first.(node.drop(1))
   ast.last.push(node)
   stack.last.shift # rm sym from stack after completing it
   next
  end

  raise "incomplete input" if s.empty?
  tok = s.first
  topsym = stack.last.first

  if topsym.is_a? String # terminal
   raise "mismatch #{tok} != #{topsym}" if tok != topsym
   stack.last.shift
   ast.last.push(s.shift)

  elsif topsym.is_a? Symbol # nonterminal
   ri = tab[topsym][tok]
   raise "no rule for #{topsym}, #{tok}" if ri.nil?
   stack.push(g[ri].rhs.clone)
   ast.push([g[ri].action])
  end

 end

 node = ast.first
 node.first.(node.drop(1))
end


来源:https://stackoverflow.com/questions/20153208/ll1-parser-implemented-with-stack-how-to-build-ast

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!