ANTLR4- dynamically inject token

自作多情 提交于 2019-12-08 07:17:33

问题


So I'm writing a python parser and I need to dynamically generate INDENT and DEDENT tokens (because python doesn't use explicit delimiters) according to the python grammar specification.

Basically I have a stack of integers representing indentation levels. In an embedded Java action in the INDENT token, I check if the current level of indentation is higher than the level on top of the stack; if it is, I push it on; if not, I call skip().

The problem is, if the current indentation level matches a level multiple levels down in the stack, I have to generate multiple DEDENT tokens, and I can't figure out how to do that.

My current code: (note that within_indent_block and current_indent_level are managed elsewhere)

fragment DENT: {within_indent_block}? (SPACE|TAB)+;

INDENT: {within_indent_block}? DENT
        {if(current_indent_level > whitespace_stack.peek().intValue()){
                 whitespace_stack.push(new Integer(current_indent_level));
                 within_indent_block = false;
         }else{
                 skip();
         }
         }
         ;    

DEDENT: {within_indent_block}? DENT
        {if(current_indent_level < whitespace_stack.peek().intValue()){
            while(current_indent_level < whitespace_stack.peek().intValue()){
                      whitespace_stack.pop();
                      <<injectDedentToken()>>; //how do I do this
            }
         }else{
               skip();
         }
         }
         ;

How do I do this and / or is there a better way?


回答1:


There are a few problems with the code you have posted.

  1. The INDENT and DEDENT rules are semantically identical (considering predicates and rule references, but ignoring actions). Since INDENT appears first, this means you can never have a token produced by the DEDENT rule is this grammar.
  2. The {within_indent_block}? predicate appears before you reference DENT as well as inside the DENT fragment rule itself. This duplication serves no purpose but will slow down your lexer.

The actual handling of post-matching actions is best placed in an override of Lexer.nextToken(). For example, you could start with something like the following.

private final Deque<Token> pendingTokens = new ArrayDeque<>();

@Override
public Token nextToken() {
    while (pendingTokens.isEmpty()) {
        Token token = super.nextToken();
        switch (token.getType()) {
        case INDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        case DEDENT:
            // handle indent here. to skip this token, simply don't add
            // anything to the pendingTokens queue and super.nextToken()
            // will be called again.
            break;

        default:
            pendingTokens.add(token);
            break;
        }
    }

    return pendingTokens.poll();
}


来源:https://stackoverflow.com/questions/18158474/antlr4-dynamically-inject-token

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!