ANTLR lexer can't lookahead at all

老子叫甜甜 提交于 2020-01-11 09:23:10

问题


I have the following grammar:

rule: 'aaa' | 'a' 'a';

It can successfully parse the string 'aaa', but it fails to parse 'aa' with the following error:

line 1:2 mismatched character '<EOF>' expecting 'a'

FYI, it is the lexer's problem not the parser's because I don't even call the parser. The main function looks like:

@members {
  public static void main(String[] args) throws Exception {
    RecipeLexer lexer = new RecipeLexer(new ANTLRInputStream(System.in));
    for (Token t = lexer.nextToken(); t.getType() != EOF; t = lexer.nextToken())
      System.out.println(t.getType());
  }
}

The result is the same with the more obvious version:

rule: AAA | A A;
AAA: 'aaa';
A: 'a';

Obviously the ANTLR lexer tries to match the input 'aa' with the rule AAA which fails. Apart from that ANTLR is an LL(*) parser or whatever, the lexer should work separately from the parser and it should be able to resolve ambiguity. The grammar works fine with the good old lex(or flex) but it doesn't seem with ANTLR. So what is the problem here?

Thanks for the help!


回答1:


ANTLR's generated parsers are (or can be) LL(*), not its lexers.

When the lexer sees the input "aa", it tries to match token AAA. When it fails to do so, it tries to match any other token that also matches "aa" (the lexer does not backtrack to match A!). Since this is not possible, an error is produced.

This is usually not a problem, since in practice, there's often some sort of identifier rule "aa" can fall back to. So, what actual problem are you trying solve, or were you only curious of the inner workings? If it's the first, please edit your question and describe your actual problem.



来源:https://stackoverflow.com/questions/12190501/antlr-lexer-cant-lookahead-at-all

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!