问题
Forgive me if my terminology is off.
Lets say I have this bit of simplified grammar:
// parser
expr : COMPARATIVE;
// lexer
WS : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+;
COMPARATOR
: 'vs'
| 'versus'
;
ITEM
: 'boy'
| 'girl'
;
COMPARATIVE :ITEM WS* COMPARATOR WS* ITEM;
So this will of course match 'boy vs girl'
or 'girl vs boy'
, etc.
But my question is that is when I create a Lexer, i.e.
CharStream stream = new ANTLRInputStream("boy vs girl");
SearchLexer lex = new SearchLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lex);
tokens.fill();
for(Token token : tokens) {
System.out.print(token.getType() + " [" + token.getText() + "] ");
}
This prints out something like this: 9 [boy vs girl], i.e. it matches my query fine, but now I want to be able to do something like, get the sub tokens of this current token.
My intuition tells me I need to use trees, but really don't know how to do this in Antlr4 for my example. Thanks
回答1:
Currently, COMPARATIVE
is a lexer rule which means it will try to make a single token from all the text that matches the rule. You should instead make a parser rule comparative
:
comparative : ITEM WS* COMPARATOR WS* ITEM;
Since COMPARATIVE
will no longer be considered a single token, you'll instead get individual tokens for ITEM
, WS
, and COMPARATOR
.
Two side notes:
If whitespace is not significant, you can hide it from the parser rules like this:
WS : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ -> channel(HIDDEN);
You can then simplify your
comparative
parser rule to simply be:comparative : ITEM COMPARATOR ITEM;
In ANTLR 4, you can simplify character sets using a new syntax:
WS : [ \t\r\n\u000C]+ -> channel(HIDDEN);
来源:https://stackoverflow.com/questions/15776494/antlr-get-sub-tokens