Antlr (lexer): matching the right token

痞子三分冷 提交于 2019-12-11 00:22:55

问题


In my Antlr3 grammar, I have several "overlapping" lexer rules, like this:

NAT: ('0' .. '9')+ ;
INT: ('+' | '-')? ('0' .. '9')+ ;
BITVECTOR: ('0' | '1')* ;

Although tokens like 100110 and 123 can be matched by more than one of those rules, it is always determined by context which of them it has to be. Example:

s: a | b | c ;
a: '<' NAT '>' ;
b: '{' INT '}' ;
c: '[' BITVECTOR ']' ;

The input {17} should then match {, INT, and }, but the lexer has already decided that 17 is a NAT-token. How can I prevent this behavior? The backtrack option is already set to true, but it only seems to affect parser rules.


回答1:


There might be a complex way to make the lexer context-sensitive, but in general that's what you want the parser to take care of, and you want your lexer to just provide a stream of tokens. My recommendation is to refactor your lexer to return DIGITS and SIGN and let your parser work out what kind of number the digits represent by the context.



来源:https://stackoverflow.com/questions/3910197/antlr-lexer-matching-the-right-token

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!