I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:
grammar output;
test: FILEPATH NEWLINE TITLE ;
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
NEWLINE: '\r'? '\n' ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
This grammar will not match something like:
c:\test.txt
x
Oddly if I change TITLE to be TITLE: 'x' ; it still fails this time giving an error message saying "mismatched input 'x' expecting 'x'" which is highly confusing. Even more oddly if I replace the usage of TITLE in test with FILEPATH the whole thing works (although FILEPATH will match more than I am looking to match so in general it isn't a valid solution for me).
I am highly confused as to why ANTLR is giving such extremely strange errors and then suddenly working for no apparent reason when shuffling things around.
This seems to be a common misunderstanding of ANTLR:
Language Processing in ANTLR:
The Language Processing is done in two strictly separated phases:
- Lexing, i.e. partitioning the text into tokens
- Parsing, i.e. building a parse tree from the tokens
Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.
Lexing
Lexing in ANTLR works as following:
- all rules with uppercase first character are lexer rules
- the lexer starts at the beginning and tries to find a rule that matches best to the current input
- a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
- tokens are generated from matches:
- if one rule matches the maximum length match the corresponding token is pushed into the token stream
- if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream
Example: What is wrong with your grammar
Your grammar has two rules that are critical:
FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.
There are two hints for that:
- keep your lexer rules disjunct (no token should match a superset of another).
- if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
- if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).
I have the same error, but I cannot imagine which rule of the lexer could be involved.
I am trying to parse some CDE files of Cobol - I think this is quite HP Nonstop specific.
Anyway, what I want to parse is something like
* SCHEMA PRODUCED DATE - TIME : 1/29/2019 - 15:17:01
?SECTION MYREQUEST,NONSTOP
* Definition MYREQUEST created on 05/11/2016 at 11:05
01 MYREQUEST.
...
and the parser fails with
mismatched input '?SECTION foo' expecting '?'
My grammar is this:
grammar CdeFile;
cdeFile : line+ ;
line : sectionLine ;
sectionLine : QUESTIONMARK SECTION sectionName '\r'? '\n' ;
sectionName : TEXTLIST ;
QUESTIONMARK : '?' ;
SECTION: S E C T I O N ;
TEXTLIST : TEXT (',' TEXT)* ;
TEXT : ~[,\n\r"]+ ;
WS : ( ' ' | '\t' | '\f' )+ -> skip;
LINE_COMMENT
: '*' {Column == 1}? '*'* ~('\n'|'\r')* '\r'? '\n' ->skip
;
// case insensitive chars
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');
The QUESTIONMARK values are in sync, everything is rebuilded - still this strange message.
This was not directly OP's problem, but for those who have the same error message, here is something you could check.
I had the same Mismatched Input 'x' expecting 'x' vague error message when I introduced a new keyword. The reason for me was that I had placed the new key word after my VARNAME lexer rule, which assigned it as a variable name instead of as the new keyword. I fixed it by putting the keywords before the VARNAME rule.
来源:https://stackoverflow.com/questions/29777778/antlr-4-5-mismatched-input-x-expecting-x