问题
I'm trying to match balanced parentheses such that, a PARAMS tree is created if a match is made, else the LPARAM and RPARAM tokens are simply added as atoms to the tree...
tokens
{
LIST;
PARAMS;
}
start : list -> ^(LIST list);
list : (expr|atom)+;
expr : LPARAM list? RPARAM -> ^(PARAMS list?);
atom : INT | LPARAM | RPARAM;
INT : '0'..'9'+;
LPARAM : '(';
RPARAM : ')';
At the moment, it will never create a PARAMS tree, because in the rule expr it will always see the end RPARAM as an atom, rather than the the closing token for that rule.
So at the moment, something like 1 2 3 (4) 5 is added to a LIST tree as a flat list of tokens, rather than the required grouping.
I've handled adding tokens as atoms to a tree before, but they never were able to start another rule, as LPARAM does here.
Do I need some sort of syntatic/semantic predicate here?
回答1:
Here is a simple approach that comes with a couple of constraints. I think these conform to the expected behavior that you mentioned in the comments.
- An unmatched
LPARAMnever appears inside a child list - An unmatched
RPARAMnever appears inside a child list
Grammar:
start : root+ EOF -> ^(LIST root+ );
root : expr
| LPARAM
| RPARAM
;
expr : list
| atom
;
list : LPARAM expr+ RPARAM -> ^(LIST expr+)
;
atom : INT
;
Rule root matches mismatched LPARAMs and RPARAMs. Rules list and atom only care about themselves.
This solution is relatively fragile because rule root requires expr to be listed before LPARAM and RPARAM. Even so, maybe this is enough to solve your problem.
Test case 1 : no lists
Input: 1 2 3
Output:
Test case 2 : one list
Input: 1 (2) 3
Output:
Test case 3 : two lists
Input: (1) 2 (3)
Output:
Test case 4 : no lists, mismatched lefts
Input: ((1 2 3
Output:
Test case 5 : two lists, mismatched lefts
Input: ((1 (2) (3)
Output:
Test case 6 : no lists, mismatched rights
Input: 1 2 3))
Output:
Test case 7 : two lists, mismatched rights
Input: (1) (2) 3))
Output:
Test case 8 : two lists, mixed mismatched lefts
Input: ((1 (2) ( (3)
Output:
Test case 9 : two lists, mixed mismatched rights
Input: (1) ) (2) 3))
Output:
Here's a slightly more complicated grammar that operates on [] and () pairs. I think the solution is going to get exponentially worse as you add pairs, but hey, it's fun! You may also be hitting the limitation of what you can do with grammar-driven AST building.
start : root+ EOF -> ^(LIST root+ )
;
root : expr
| LPARAM
| RPARAM
| LSQB
| RSQB
;
expr : plist
| slist
| atom
;
plist : LPARAM pexpr* RPARAM -> ^(LIST pexpr*)
;
pexpr : slist
| atom
| LSQB
| RSQB
;
slist : LSQB sexpr* RSQB -> ^(LIST sexpr*)
;
sexpr : plist
| atom
| LPARAM
| RPARAM
;
atom : INT;
INT : ('0'..'9')+;
LPARAM : '(';
RPARAM : ')';
LSQB : '[';
RSQB : ']';
来源:https://stackoverflow.com/questions/13980501/parse-parenthesis-as-atoms-antlr