问题
I am writing an ANTLR grammar to parse a log files, and faced a problem. I have simplified my grammar to reproduce the problem as followed:
stmt1:
'[ ' elapse ': ' stmt2
;
stmt2:
'[xxx'
;
stmt3:
': [yyy'
;
elapse :
FLOAT;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')*
;
When I used the following string to test the grammar:
[ 98.9: [xxx
I got the error:
E:\work\antlr\output\__Test___input.txt line 1:9 mismatched character 'x' expecting 'y'
E:\work\antlr\output\__Test___input.txt line 1:10 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:11 no viable alternative at character 'x'
E:\work\antlr\output\__Test___input.txt line 1:12 mismatched input '<EOF>' expecting ': '
But if I remove the ruel 'stmt3', same string would be accepted.
I am not sure what happened...
Thanks for any advice!
Leon
Thanks help from Bart. I have tried to correct the grammar. I think, the baseline, I have to disambiguate all tokens. And I add WS token to simplify the rule.
stmt1:
'[' elapse ':' stmt2
;
stmt2:
'[' 'xxx'
;
stmt3:
':' '[' 'yyy'
;
elapse :
FLOAT;
FLOAT
: ('0'..'9')+ '.' ('0'..'9')*
;
WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;
回答1:
ANTLR has a strict separation between lexer rules (tokens) and parser rules. Although you defined some literals inside parser rules, they are still tokens. This means the following grammar is equivalent (in practice) to your example grammar:
stmt1 : T1 elapse T2 stmt2 ;
stmt2 : T3 ;
stmt3 : T4 ;
elapse : FLOAT;
T1 : '[ ' ;
T2 : ': ' ;
T3 : '[xxx' ;
T4 : ': [yyy' ;
FLOAT : ('0'..'9')+ '.' ('0'..'9')* ;
Now, when the lexer tries to construct tokens from the input "[ 98.9: [xxx"
, it successfully creates the tokens T1
and FLOAT
, but when it sees ": ["
, it tries to construct a T4
token. But when the next char in the stream is a "x"
instead of a "y"
, the lexer tries to construct another token that starts with ": ["
. But since there is no such token, the lexer emit the error:
[...] mismatched character 'x' expecting 'y'
And no, the lexer will not backtrack to "give up" the character "["
from ": ["
to match the token T2
, nor will it look ahead in the char-stream to see if a T4
token can really be constructed. ANTLR's LL(*) is only applicable to parser rules, not lexer rules!
来源:https://stackoverflow.com/questions/13125935/what-is-the-wrong-with-the-simple-antlr-grammar