What is the wrong with the simple ANTLR grammar?

假装没事ソ 提交于 2019-12-24 10:31:07

问题


I am writing an ANTLR grammar to parse a log files, and faced a problem. I have simplified my grammar to reproduce the problem as followed:

stmt1:
  '[ ' elapse ': ' stmt2
  ;

stmt2:
  '[xxx'
  ;

stmt3:
  ': [yyy'
  ;

elapse :
  FLOAT;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* 
    ;

When I used the following string to test the grammar:

[ 98.9: [xxx

I got the error:

E:\work\antlr\output\__Test___input.txt line 1:9 mismatched character 'x' expecting 'y'
E:\work\antlr\output\__Test___input.txt line 1:10 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:11 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:12 mismatched input '<EOF>' expecting ': '

But if I remove the ruel 'stmt3', same string would be accepted.

I am not sure what happened...

Thanks for any advice!

Leon


Thanks help from Bart. I have tried to correct the grammar. I think, the baseline, I have to disambiguate all tokens. And I add WS token to simplify the rule.

stmt1:
  '[' elapse ':' stmt2
  ;

stmt2:
  '[' 'xxx'
  ;

stmt3:
  ':' '[' 'yyy'
  ;

elapse :
  FLOAT;

FLOAT
    :   ('0'..'9')+ '.' ('0'..'9')* 
    ;

WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;   

回答1:


ANTLR has a strict separation between lexer rules (tokens) and parser rules. Although you defined some literals inside parser rules, they are still tokens. This means the following grammar is equivalent (in practice) to your example grammar:

stmt1  : T1 elapse T2 stmt2 ;
stmt2  : T3 ;
stmt3  : T4 ;
elapse : FLOAT;

T1     : '[ ' ;
T2     : ': ' ;
T3     : '[xxx' ;
T4     : ': [yyy' ;
FLOAT  : ('0'..'9')+ '.' ('0'..'9')* ;

Now, when the lexer tries to construct tokens from the input "[ 98.9: [xxx", it successfully creates the tokens T1 and FLOAT, but when it sees ": [", it tries to construct a T4 token. But when the next char in the stream is a "x" instead of a "y", the lexer tries to construct another token that starts with ": [". But since there is no such token, the lexer emit the error:

[...] mismatched character 'x' expecting 'y'

And no, the lexer will not backtrack to "give up" the character "[" from ": [" to match the token T2, nor will it look ahead in the char-stream to see if a T4 token can really be constructed. ANTLR's LL(*) is only applicable to parser rules, not lexer rules!



来源:https://stackoverflow.com/questions/13125935/what-is-the-wrong-with-the-simple-antlr-grammar

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!