ANTLR3 grammar does not match rule with predicate

问题

I have a combined grammar where I need to provide for two identifier lexer rules. Both identifiers can be used at the same time. Identifier1 comes before Identifer2 in grammar.

First identifier is static, whereas second identifier rule changes on the basis of some flag.(Using predicate).

I want the second identifier to match in parser rules. But as both identifiers may match some common inputs, It does not fall on identifer2.

I have created small grammar to make it understandable. Grammar is as:

@lexer::members
{
  private boolean flag;

  public void setFlag(boolean flag)
  {
    this.flag = flag;
  }
}


identifier1 :
 ID1
 ;

identifier2 :
ID2
; 


ID1 : (CHARS) *;


ID2 : (CHARS | ({flag}? '_'))* ;


fragment CHARS 
: 
  ('a' .. 'z')
;

If I try to match identifer2 rule as :

    ANTLRStringStream in = new ANTLRStringStream("abcabde");
    IdTestLexer lexer = new IdTestLexer(in);
    lexer.setFlag(true);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    IdTestParser parser = new IdTestParser(tokens);
    parser.identifier2();

It shows error: line 1:0 missing ID2 at 'abcabde'

回答1:

ID1 : (CHARS) *;
ID2 : (CHARS | ({flag}? '_'))* ;

For ANTLR these two rules mean:

If the input is just characters, it's ID1
If the input mixes characters and _ and flag == true, it's ID2

Note that if flag == false, ID2 will never be matched.

The two basic rules the Lexer follows are:

It matches the token that covers the longest sub-sequence of input
If multiple tokens can match the same input, use the one that comes first in the grammar

I believe your core issue is misunderstanding the difference between lexer and parser and their usage. The question you should ask yourself is: When should 'abcabde' be matched as ID1 and when as ID2?

Always ID1 - then your grammar is correct as it is now.
Always ID2 - then you should switch the two rules - but note that in such case ID1 will never be matched.
It depends on flag - then you need to modify the predicate according to your logic, just toggling the underscore isn't enough.
It depends on where in the input the identifier is used - then this is not something that lexer can decide, and you need to tell the two kinds of identifiers apart in parser rather than lexer. Formally, lexer uses regular language while you need context-free language to decide about the identifiers like that.

来源：https://stackoverflow.com/questions/51359630/antlr3-grammar-does-not-match-rule-with-predicate

标签

java

grammar

identifier

antlr3

lexer