ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

♀尐吖头ヾ 提交于 2019-11-27 11:35:02
CoronA

This seems to be a common misunderstanding of ANTLR:

Language Processing in ANTLR:

The Language Processing is done in two strictly separated phases:

  • Lexing, i.e. partitioning the text into tokens
  • Parsing, i.e. building a parse tree from the tokens

Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.

Lexing

Lexing in ANTLR works as following:

  • all rules with uppercase first character are lexer rules
  • the lexer starts at the beginning and tries to find a rule that matches best to the current input
  • a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
  • tokens are generated from matches:
    • if one rule matches the maximum length match the corresponding token is pushed into the token stream
    • if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream

Example: What is wrong with your grammar

Your grammar has two rules that are critical:

FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;

Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.

There are two hints for that:

  • keep your lexer rules disjunct (no token should match a superset of another).
  • if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
  • if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).

I have the same error, but I cannot imagine which rule of the lexer could be involved.
I am trying to parse some CDE files of Cobol - I think this is quite HP Nonstop specific.

Anyway, what I want to parse is something like

* SCHEMA PRODUCED DATE - TIME : 1/29/2019 - 15:17:01
?SECTION MYREQUEST,NONSTOP
* Definition MYREQUEST created on 05/11/2016 at 11:05
  01 MYREQUEST. 
  ...

and the parser fails with

mismatched input '?SECTION foo' expecting '?'

My grammar is this:

grammar CdeFile;

cdeFile : line+ ;
line : sectionLine ;
sectionLine : QUESTIONMARK SECTION sectionName '\r'? '\n' ;
sectionName : TEXTLIST ;

QUESTIONMARK : '?' ;
SECTION: S E C T I O N ;

TEXTLIST : TEXT (',' TEXT)* ;
TEXT : ~[,\n\r"]+ ;

WS : ( ' ' | '\t' | '\f' )+ -> skip;
LINE_COMMENT 
     : '*' {Column == 1}? '*'* ~('\n'|'\r')* '\r'? '\n' ->skip 
     ; 


// case insensitive chars
fragment A:('a'|'A');
fragment B:('b'|'B');
fragment C:('c'|'C');
fragment D:('d'|'D');
fragment E:('e'|'E');
fragment F:('f'|'F');
fragment G:('g'|'G');
fragment H:('h'|'H');
fragment I:('i'|'I');
fragment J:('j'|'J');
fragment K:('k'|'K');
fragment L:('l'|'L');
fragment M:('m'|'M');
fragment N:('n'|'N');
fragment O:('o'|'O');
fragment P:('p'|'P');
fragment Q:('q'|'Q');
fragment R:('r'|'R');
fragment S:('s'|'S');
fragment T:('t'|'T');
fragment U:('u'|'U');
fragment V:('v'|'V');
fragment W:('w'|'W');
fragment X:('x'|'X');
fragment Y:('y'|'Y');
fragment Z:('z'|'Z');

The QUESTIONMARK values are in sync, everything is rebuilded - still this strange message.

This was not directly OP's problem, but for those who have the same error message, here is something you could check.


I had the same Mismatched Input 'x' expecting 'x' vague error message when I introduced a new keyword. The reason for me was that I had placed the new key word after my VARNAME lexer rule, which assigned it as a variable name instead of as the new keyword. I fixed it by putting the keywords before the VARNAME rule.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!