ANTLR chaining 1 to 1 grammar rules together to solve conditionals

被刻印的时光 ゝ 提交于 2019-12-06 03:52:21
Bart Kiers

As already mentioned, the "chain" is needed to properly handle operator precedence. Without it, input like 1+2*3 would be parsed as:

     *
    / \
   +   3
  / \
 1   2

instead of:

  +
 / \
1   *
   / \
  2   3

Since ANTLR 4 supports direct left recursive rules:

foo
 : foo '?' foo
 | TOKEN
 ;

so not indirect left recursive rules:

foo
 : bar
 | TOKEN
 ;

bar
 : foo '?' foo
 ;

You can rewrite these rules as follows:

expression
 : '-' expression
 | '(' type_name ')' expression
 | expression ('*' | '/' | '%') expression
 | expression ('+' | '-') expression
 | expression ('<<' | '>>') expression
 | expression ('<' | '>' | '<=' | '>=') expression
 | expression ('!=' | '==') expression
 | expression '&' expression
 | expression '^' expression
 | expression '|' expression
 | expression '&&' expression
 | expression '||' expression
 | expression '?' expression ':' expression
 | IDENTIFIER
 | NUMBER
 ;

If the parser now stumbles upon an expression, it will first look for ('*' | '/' | '%'), and if that's not there, it will look for ('+' | '-'), etc. In other words, the alternatives placed first in the rule will get precedence over alternatives placed lower in the rule.

Now I know from your earlier question, Once grammar is complete, what's the best way to walk an ANTLR v4 tree?, that you're using a listener to "walk" the tree. If you create an expression rule as I just showed, you'd need to do a lot of manual inspections in your enterExpression(...) and exitExpression(...) methods to find out which of the alternatives matched an expression. This is where "labels" come in handy. You simply label each alternative in your expression rule:

expression
 : '-' expression                                  #unaryExpr
 | '(' type_name ')' expression                    #castExpr
 | expression ('*' | '/' | '%') expression         #multExpr
 | expression ('+' | '-') expression               #addExpr
 | expression ('<<' | '>>') expression             #...
 | expression ('<' | '>' | '<=' | '>=') expression 
 | expression ('!=' | '==') expression
 | expression '&' expression
 | expression '^' expression
 | expression '|' expression
 | expression '&&' expression
 | expression '||' expression
 | expression '?' expression ':' expression
 | IDENTIFIER
 | NUMBER
 ;

(note that when you label one, you must label them all!)

And then the base listener class will have enter- and exit method for all alternatives:

public void enterUnaryExpr(...)
public void exitUnaryExpr(...)

public void enterCastExpr(...)
public void exitCastExpr(...)

public void enterMultExpr(...)
public void exitMultExpr(...)

...

There is a very good reason for doing it this way: operator precedence. Taking your example of the logical OR and left bitwise shift, think about something like

if (a << b || c)

Objective-C precedence rules say that the '<<' has precedence, so the correct way to evaluate this is

(a << b) || c

The parser rules manage this by using the chain you mention, because the rule for '||' is higher up in the chain, the parse correctly gives a << b as a subexpression for the || operator.

There is no better way in Antl3, however in Antlr4, there is, as Antlr4 allows directly left recursive rules. I highly recommend the "Definitive Antlr4 reference" as it has a very good explanation of this issue.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!