Antlr get Sub-Tokens

左心房为你撑大大i 提交于 2019-12-25 02:50:57

问题


Forgive me if my terminology is off.

Lets say I have this bit of simplified grammar:

// parser
expr : COMPARATIVE;

// lexer
WS : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+; 
COMPARATOR 
        : 'vs'
    | 'versus'
        ;
ITEM 
        : 'boy'
        | 'girl'
        ;
COMPARATIVE :ITEM WS* COMPARATOR WS* ITEM;

So this will of course match 'boy vs girl' or 'girl vs boy', etc. But my question is that is when I create a Lexer, i.e.

CharStream stream = new ANTLRInputStream("boy vs girl");
SearchLexer lex = new SearchLexer(stream);
CommonTokenStream tokens = new CommonTokenStream(lex);
tokens.fill();
for(Token token : tokens) {
    System.out.print(token.getType() + " [" + token.getText() + "] ");
}

This prints out something like this: 9 [boy vs girl], i.e. it matches my query fine, but now I want to be able to do something like, get the sub tokens of this current token.

My intuition tells me I need to use trees, but really don't know how to do this in Antlr4 for my example. Thanks


回答1:


Currently, COMPARATIVE is a lexer rule which means it will try to make a single token from all the text that matches the rule. You should instead make a parser rule comparative:

comparative : ITEM WS* COMPARATOR WS* ITEM;

Since COMPARATIVE will no longer be considered a single token, you'll instead get individual tokens for ITEM, WS, and COMPARATOR.

Two side notes:

  1. If whitespace is not significant, you can hide it from the parser rules like this:

    WS : ('\t' | ' ' | '\r' | '\n'| '\u000C')+ -> channel(HIDDEN);
    

    You can then simplify your comparative parser rule to simply be:

    comparative : ITEM COMPARATOR ITEM;
    
  2. In ANTLR 4, you can simplify character sets using a new syntax:

    WS : [ \t\r\n\u000C]+ -> channel(HIDDEN);
    


来源:https://stackoverflow.com/questions/15776494/antlr-get-sub-tokens

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!