ANTLR: From CommonTree to useful object graph

纵然是瞬间 提交于 2019-11-28 08:47:37

Here's a a possible way. In short, these are the step you'd perform:

  1. create a combined grammar that will generate the lexer and parser;
  2. mix AST rewrite rules in the grammar from (1) to transform the flat list of tokens into a proper tree;
  3. write a tree grammar that can walk the tree from (2);
  4. mix custom code inside your tree walker;
  5. test it.

1

Let's create a small expression parser supporting +, -, *, /, (...) and numbers, which could look like:

grammar Exp; // file: Exp.g

eval
  :  exp EOF
  ;

exp 
  :  addExp 
  ;

addExp
  :  mulExp ((Add | Sub) mulExp)*
  ;

mulExp
  :  unaryExp ((Mul | Div) unaryExp)*
  ;

unaryExp
  :  Sub atom
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')'
  ;

Add    : '+';
Sub    : '-';
Mul    : '*';
Div    : '/';
Number : '0'..'9'+;
Space  : ' ' {skip();};

2

Including rewrite rules, it will look like:

grammar Exp; // file: Exp.g

options {
  output=AST;
}

tokens {
  U_SUB;
}

eval 
  :  exp EOF -> exp
  ;

exp 
  :  addExp 
  ;

addExp
  :  mulExp ((Add | Sub)^ mulExp)*
  ;

mulExp
  :  unaryExp ((Mul | Div)^ unaryExp)*
  ;

unaryExp
  :  Sub atom -> ^(U_SUB atom)
  |  atom
  ;

atom
  :  Number
  |  '(' exp ')' -> exp
  ;

Add    : '+';
Sub    : '-';
Mul    : '*';
Div    : '/';
Number : '0'..'9'+;
Space  : ' ' {skip();};

Now an expression like 10 - 2 * (3 + 8) will be transformed to:

3

To create a tree grammar that generates an iterator for the AST generated in (2), you'd do something like this:

tree grammar ExpWalker; // file: ExpWalker.g

options {
  tokenVocab=Exp; // use the tokens from Exp.g
  ASTLabelType=CommonTree;
}

eval
  :  exp
  ;

exp
  :  ^(Add exp exp)
  |  ^(Sub exp exp)
  |  ^(Mul exp exp)
  |  ^(Div exp exp)
  |  ^(U_SUB exp)
  |  Number
  ;

4

And to mix your custom classes in this tree iterator, do something like this:

tree grammar ExpWalker; // file: ExpWalker.g

options {
  tokenVocab=Exp; // use the tokens from Exp.g
  ASTLabelType=CommonTree;
}

eval returns [ExpNode e]
  :  exp {e = $exp.e;}
  ;

exp returns [ExpNode e]
  :  ^(Add a=exp b=exp) {e = new AddExp($a.e, $b.e);}
  |  ^(Sub a=exp b=exp) {e = new SubExp($a.e, $b.e);}
  |  ^(Mul a=exp b=exp) {e = new MulExp($a.e, $b.e);}
  |  ^(Div a=exp b=exp) {e = new DivExp($a.e, $b.e);}
  |  ^(U_SUB a=exp)     {e = new UnaryExp($a.e);}
  |  Number             {e = new NumberExp($Number.text);}
  ;

5

Here's some code to test all classes (stick it all just in one file: Main.java):

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "10 - 2 * (3 + 8)";
    ExpLexer lexer = new ExpLexer(new ANTLRStringStream(source));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    ExpParser parser = new ExpParser(tokens);
    ExpParser.eval_return returnValue = parser.eval();
    CommonTree tree = (CommonTree)returnValue.getTree();
    CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
    ExpWalker walker = new ExpWalker(nodes);
    ExpNode root = walker.eval();
    System.out.println(source + " = " + root.evaluate());
  }
}

interface ExpNode {
  double evaluate();
}

class NumberExp implements ExpNode {

  final double num;

  NumberExp(String s) {
    num = Double.parseDouble(s);
  }

  @Override
  public double evaluate() {
    return num;
  }
}

class AddExp implements ExpNode {

  final ExpNode left, right;

  AddExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() + right.evaluate();
  }
}

class SubExp implements ExpNode {

  final ExpNode left, right;

  SubExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() - right.evaluate();
  }
}

class MulExp implements ExpNode {

  final ExpNode left, right;

  MulExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() * right.evaluate();
  }
}

class DivExp implements ExpNode {

  final ExpNode left, right;

  DivExp(ExpNode a, ExpNode b) {
    left = a;
    right = b;
  }

  @Override
  public double evaluate() {
    return left.evaluate() / right.evaluate();
  }
}

class UnaryExp implements ExpNode {

  final ExpNode exp;

  UnaryExp(ExpNode e) {
    exp = e;
  }

  @Override
  public double evaluate() {
    return -exp.evaluate();
  }
}

and then do:

# generate a lexer & parser
java -cp antlr-3.2.jar org.antlr.Tool Exp.g

# generate the tree walker
java -cp antlr-3.2.jar org.antlr.Tool ExpWalker.g

# compile everything
javac -cp antlr-3.2.jar *.java

# run the main class
java -cp .:antlr-3.2.jar Main         # *nix 
java -cp .;antlr-3.2.jar Main         # Windows

which prints:

10 - 2 * (3 + 8) = -12.0

You could skip the tree-walker and mix all the code and returns [...] inside your combined grammar, but IMO, a tree grammar keeps things more orderly because the lexer rules, and tokens like ( and ) etc. are removed from it.

HTH

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!