Antlr Extraneous Input

拟墨画扇 提交于 2019-12-04 07:34:31

The string "keyup" is being tokenized as a NAME token: that is the problem.

You must realize that the lexer operates independently from the parser. If the parser is trying to match a KEYPRESS token, the lexer does not "listen" to it, but just constructs a token following the rules:

  1. match the rule that consumes the most characters
  2. if there are more rules that match the same amount of characters, choose the one that is defined first

Taking these rules into account, and the order of your rules:

NAME : [A-Za-z_][A-Za-z_0-9]* ;

INT : [0-9]+ ;

KEY : [a-z] | [0-9] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

a NAME token will be created before most of the KEY alternatives, and all of the KEYPRESS alternatives will be created.

And since an INT matches one or more digits and is defined before KEY which also has a single digit alternative, it is clear that the lexer will never produce a KEY or KEYPRESS token.

If you move the NAME and INT rule below the KEY and KEYPRESS rules, then most of the tokens will be constructed as you expect, is my guess.

EDIT

A possible solution would look like:

KEY : [a-z] | 'shift' | 'ctrl' | 'alt' | 'meta' | 'space' | 'left' | 'right' | 'up' | 'down' | 'minus' | 'equals' | 'backspace' | 'openbracket' | 'closebracket' | 'backslash' | 'semicolon' | 'quote' | 'enter' | 'comma' | 'period' | 'slash' ;

KEYPRESS : 'keyup' | 'keydown' ;

NAME : [A-Za-z_][A-Za-z_0-9]* ;

SINGLE_DIGIT : [0-9] ;

INT : [0-9]+ ;

I.e. I removed the [0-9] alternative from KEY and introduced a SINGLE_DIGIT rule (which is placed before the INT rule!).

Now create some extra parser rules:

integer : INT | SINGLE_DIGIT ;

key : KEY | SINGLE_DIGIT ;

and change all occurrences of INT inside parser rules to integer (don't call your rule int: it is a reserved word) and change all KEY to key.

And you might also want to do something similar to NAME and the [a-z] alternative in KEY (i.e. a single lowercase char would now never be tokenized as a NAME, always as a KEY).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!