问题
How to use lexer rules having same starting?
I am trying to use two similar lexer rules (having the same starting):
TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;
INTEGER     : ('0'..'9')+;
COLON       : ':';
Here is my sample grammar:
grammar TestTime;
text      : (timeexpr | caseblock)*;
timeexpr  : TIME;
caseblock : INT COLON ID;
TIME      : ('0'..'9')+ ':' ('0'..'9')+;
INT       : ('0'..'9')+;
COLON     : ':';
ID        : ('a'..'z')+;
WS        : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
When i try to parse text:
12:44
123 : abc
123: abc
First two lines are parsed correctly, 3rd - generates error. For some reason, '123:' ANTLR parses as TIME (while it is not)...
So, is it possible to make grammar with such lexems?
Having such rules is necessary in my language for using both case-blocks and datetime constants. For example in my language it is possible to write:
case MyInt of
  1: a := 01.01.2012;
  2: b := 12:44;
  3: ....
end;
回答1:
As soon DIGIT+ ':' is matched, the lexer expects this to be followed by another DIGIT to match a TIMECONSTANT. If this does not happen, it cannot fall back on another lexer rule that matches DIGIT+ ':' and the lexer will not give up on the already matched ':' to match an INTEGER.
A possible solution would be to optionally match ':' DIGIT+ at the end of the INTEGER rule and change the type of the token if this gets matched:
grammar T;  
parse
 : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
 ;
INTEGER      : DIGIT+ ((':' DIGIT)=> ':' DIGIT+ {$type=TIMECONSTANT;})?;
COLON        : ':';
SPACE        : ' ' {skip();};
fragment DIGIT : '0'..'9';
fragment TIMECONSTANT : ;
When parsing the input:
11: 12:13 : 14
the following will be printed:
INTEGER         '11'
COLON           ':'
TIMECONSTANT    '12:13'
COLON           ':'
INTEGER         '14'
EDIT
Not too nice, but works...
True. However, this is not an ANTLR short coming: most lexer generators I know will have a problem properly tokenizing such a TIMECONSTANT (when INTEGER and COLON are also present). ANTLR at least facilitates a way to handle it in the lexer :)
You could also let this be handled by the parser instead of the lexer:
time_const : INTEGER COLON INTEGER;
INTEGER    : '0'..'9'+;
COLON      : ':';
SPACE      : ' ' {skip();};
However, if your language's lexer ignores white spaces, then input like:
12 :    34
would also be match by the time_const rule, of course.
回答2:
ANTLR lexers can't backtrack, which means once it reaches the ':' in the TIMECONSTANT rule it must complete the rule or an exception will be thrown. You can get your grammar working by using a predicate to test for the presence of a number following the colon.
TIMECONSTANT: ('0'..'9')+ (':' '0'..'9')=> ':' ('0'..'9')+;
INTEGER     : ('0'..'9')+;
COLON       : ':';
This will force ANTLR to look beyond the colon before it decides that it is in a TIMECONSTANT rule.
来源:https://stackoverflow.com/questions/10029137/antlr-how-to-use-lexer-rules-having-same-starting