So I'm writing a python parser and I need to dynamically generate INDENT and DEDENT tokens (because python doesn't use explicit delimiters) according to the python grammar specification.
Basically I have a stack of integers representing indentation levels. In an embedded Java action in the INDENT token, I check if the current level of indentation is higher than the level on top of the stack; if it is, I push it on; if not, I call skip().
The problem is, if the current indentation level matches a level multiple levels down in the stack, I have to generate multiple DEDENT tokens, and I can't figure out how to do that.
My current code: (note that within_indent_block and current_indent_level are managed elsewhere)
fragment DENT: {within_indent_block}? (SPACE|TAB)+;
INDENT: {within_indent_block}? DENT
{if(current_indent_level > whitespace_stack.peek().intValue()){
whitespace_stack.push(new Integer(current_indent_level));
within_indent_block = false;
}else{
skip();
}
}
;
DEDENT: {within_indent_block}? DENT
{if(current_indent_level < whitespace_stack.peek().intValue()){
while(current_indent_level < whitespace_stack.peek().intValue()){
whitespace_stack.pop();
<<injectDedentToken()>>; //how do I do this
}
}else{
skip();
}
}
;
How do I do this and / or is there a better way?
There are a few problems with the code you have posted.
- The
INDENTandDEDENTrules are semantically identical (considering predicates and rule references, but ignoring actions). SinceINDENTappears first, this means you can never have a token produced by theDEDENTrule is this grammar. - The
{within_indent_block}?predicate appears before you referenceDENTas well as inside theDENTfragment rule itself. This duplication serves no purpose but will slow down your lexer.
The actual handling of post-matching actions is best placed in an override of Lexer.nextToken(). For example, you could start with something like the following.
private final Deque<Token> pendingTokens = new ArrayDeque<>();
@Override
public Token nextToken() {
while (pendingTokens.isEmpty()) {
Token token = super.nextToken();
switch (token.getType()) {
case INDENT:
// handle indent here. to skip this token, simply don't add
// anything to the pendingTokens queue and super.nextToken()
// will be called again.
break;
case DEDENT:
// handle indent here. to skip this token, simply don't add
// anything to the pendingTokens queue and super.nextToken()
// will be called again.
break;
default:
pendingTokens.add(token);
break;
}
}
return pendingTokens.poll();
}
来源:https://stackoverflow.com/questions/18158474/antlr4-dynamically-inject-token