adding (…) {…} function literals while abstaining from backtracking

亡梦爱人 提交于 2019-12-04 05:19:59

问题


Building off the answer found in How to have both function calls and parenthetical grouping without backtrack, I'd like to add function literals which are in a non LL(*) means implemented like

...

tokens {
 ...
 FN;
 ID_LIST;
}

stmt
 : expr SEMI // SEMI=';'
 ;

callable
 : ...
 | fn
 ;

fn
 : OPAREN opt_id_list CPAREN compound_stmt
   -> ^(FN opt_id_list compound_stmt)
 ;

compound_stmt
 : OBRACE stmt* CBRACE

opt_id_list
 : (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
 ;

What I'd like to do is allow anonymous function literals that have an argument list (e.g. () or (a) or (a, b, c)) followed by a compound_stmt. So (a, b, c){...} is good. But (x)(y){} not so much. (Of course (x) * (y){} is "valid" in terms of the parser, just as ((y){})()[1].x would be.)


回答1:


The parser needs a bit of extra look ahead. I guess it could be done without it, but it would definitely result in some horrible looking parser rule(s) that are a pain to maintain and a parser that would accept (a, 2, 3){...} (a function literal with an expression-list instead of an id-list), for example. This would cause you to do quite a bit of semantic checking after the AST has been created.

The (IMO) best way to solve this is by adding the function literal rule in the callable and adding a syntactic predicate in front of it which will tell the parser to make sure there really is such an alternative before actually matching it.

callable
 : (fn_literal)=> fn_literal
 | OPAREN expr CPAREN -> expr
 | ID
 ;

A demo:

grammar T;

options {
  output=AST;
}

tokens {
 // literal tokens
 EQ     = '==' ;
 GT     = '>' ;
 LT     = '<' ;
 GTE    = '>=' ;
 LTE    = '<=' ;
 LAND   = '&&' ;
 LOR    = '||' ;
 PLUS   = '+' ;
 MINUS  = '-' ;
 TIMES  = '*' ;
 DIVIDE = '/' ;
 OPAREN = '(' ;
 CPAREN = ')' ;
 OBRACK = '[' ;
 CBRACK = ']' ;
 DOT    = '.' ;
 COMMA  = ',' ;
 OBRACE = '{' ;
 CBRACE = '}' ;
 SEMI   = ';' ;

 // imaginary tokens
 CALL;
 INDEX;
 LOOKUP;
 UNARY_MINUS;
 PARAMS;
 FN;
 ID_LIST;
 STATS;
}

prog
 : expr EOF -> expr
 ;

expr
 : boolExpr
 ;

boolExpr
 : relExpr ((LAND | LOR)^ relExpr)?
 ;

relExpr
 : (a=addExpr -> $a) ( (oa=relOp b=addExpr    -> ^($oa $a $b))
                         ( ob=relOp c=addExpr -> ^(LAND ^($oa $a $b) ^($ob $b $c))
                         )?
                     )?
 ;

addExpr
 : mulExpr ((PLUS | MINUS)^ mulExpr)*
 ;

mulExpr
 : unaryExpr ((TIMES | DIVIDE)^ unaryExpr)*
 ;

unaryExpr
 : MINUS atomExpr -> ^(UNARY_MINUS atomExpr)
 | atomExpr
 ;

atomExpr
 : INT
 | call
 ;

call
 : (callable -> callable) ( OPAREN params CPAREN -> ^(CALL $call params)
                          | OBRACK expr CBRACK   -> ^(INDEX $call expr)
                          | DOT ID               -> ^(INDEX $call ID)
                          )*
 ;

callable
 : (fn_literal)=> fn_literal
 | OPAREN expr CPAREN -> expr
 | ID
 ;

fn_literal
 : OPAREN id_list CPAREN compound_stmt -> ^(FN id_list compound_stmt)
 ;

id_list
 : (ID (COMMA ID)*)? -> ^(ID_LIST ID*)
 ;

params
 : (expr (COMMA expr)*)? -> ^(PARAMS expr*)
 ;

compound_stmt
 : OBRACE stmt* CBRACE -> ^(STATS stmt*)
 ;

stmt
 : expr SEMI
 ;

relOp
 : EQ | GT | LT | GTE | LTE
 ;

ID     : 'a'..'z'+ ;
INT    : '0'..'9'+ ;
SPACE  : (' ' | '\t') {skip();};

A parser generated by the grammar above would reject the input (x)(y){} while it properly parses the following 3 snippets of code:

1

(a, b, c){ a+b*c; }

2

(x) * (y){ x.y; }

3

((y){})()[1].x



来源:https://stackoverflow.com/questions/10762153/adding-function-literals-while-abstaining-from-backtracking

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!