问题
Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like
...
tokens {
...
FN;
ID_LIST;
}
stmt
: expr SEMI // SEMI=';'
;
callable
: ...
| fn
;
fn
: OPAREN opt_id_list CPAREN compound_stmt
-> ^(FN opt_id_list compound_stmt)
;
compound_stmt
: OBRACE stmt* CBRACE
opt_id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)
回答1:
The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.
The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
A demo:
grammar T;
options {
output=AST;
}
tokens {
// literal tokens
EQ = '==' ;
GT = '>' ;
LT = '<' ;
GTE = '>=' ;
LTE = '<=' ;
LAND = '&&' ;
LOR = '||' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
OPAREN = '(' ;
CPAREN = ')' ;
OBRACK = '[' ;
CBRACK = ']' ;
DOT = '.' ;
COMMA = ',' ;
OBRACE = '{' ;
CBRACE = '}' ;
SEMI = ';' ;
// imaginary tokens
CALL;
INDEX;
LOOKUP;
UNARY_MINUS;
PARAMS;
FN;
ID_LIST;
STATS;
}
prog
: expr EOF -> expr
;
expr
: boolExpr
;
boolExpr
: relExpr ((LAND | LOR)^ relExpr)?
;
relExpr
: (a=addExpr -> $a) ( (oa=relOp b=addExpr -> ^($oa $a $b))
( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
)?
)?
;
addExpr
: mulExpr ((PLUS | MINUS)^ mulExpr)*
;
mulExpr
: unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
;
unaryExpr
: MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
| atomExpr
;
atomExpr
: INT
| call
;
call
: (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
| OBRACK expr CBRACK -> ^(INDEX $call expr)
| DOT ID -> ^(INDEX $call ID)
)*
;
callable
: (fn_literal)=> fn_literal
| OPAREN expr CPAREN -> expr
| ID
;
fn_literal
: OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
;
id_list
: (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
;
params
: (expr (COMMA expr)*)? -> ^(PARAMS expr*)
;
compound_stmt
: OBRACE stmt* CBRACE -> ^(STATS stmt*)
;
stmt
: expr SEMI
;
relOp
: EQ | GT | LT | GTE | LTE
;
ID : 'a'..'z'+ ;
INT : '0'..'9'+ ;
SPACE : (' ' | '\t') {skip();};
A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:
1
(a, b, c){ a+b*c; }
2
(x) * (y){ x.y; }
3
((y){})()[1].x
来源:https://stackoverflow.com/questions/10762153/adding-function-literals-while-abstaining-from-backtracking