问题
I'm trying to match balanced parentheses such that, a PARAMS
tree is created if a match is made, else the LPARAM and RPARAM tokens are simply added as atoms to the tree...
tokens
{
LIST;
PARAMS;
}
start : list -> ^(LIST list);
list : (expr|atom)+;
expr : LPARAM list? RPARAM -> ^(PARAMS list?);
atom : INT | LPARAM | RPARAM;
INT : '0'..'9'+;
LPARAM : '(';
RPARAM : ')';
At the moment, it will never create a PARAMS
tree, because in the rule expr it will always see the end RPARAM
as an atom, rather than the the closing token for that rule.
So at the moment, something like 1 2 3 (4) 5
is added to a LIST
tree as a flat list of tokens, rather than the required grouping.
I've handled adding tokens as atoms to a tree before, but they never were able to start another rule, as LPARAM
does here.
Do I need some sort of syntatic/semantic predicate here?
回答1:
Here is a simple approach that comes with a couple of constraints. I think these conform to the expected behavior that you mentioned in the comments.
- An unmatched
LPARAM
never appears inside a child list - An unmatched
RPARAM
never appears inside a child list
Grammar:
start : root+ EOF -> ^(LIST root+ );
root : expr
| LPARAM
| RPARAM
;
expr : list
| atom
;
list : LPARAM expr+ RPARAM -> ^(LIST expr+)
;
atom : INT
;
Rule root
matches mismatched LPARAM
s and RPARAM
s. Rules list
and atom
only care about themselves.
This solution is relatively fragile because rule root
requires expr
to be listed before LPARAM
and RPARAM
. Even so, maybe this is enough to solve your problem.
Test case 1 : no lists
Input: 1 2 3
Output:

Test case 2 : one list
Input: 1 (2) 3
Output:

Test case 3 : two lists
Input: (1) 2 (3)
Output:

Test case 4 : no lists, mismatched lefts
Input: ((1 2 3
Output:

Test case 5 : two lists, mismatched lefts
Input: ((1 (2) (3)
Output:

Test case 6 : no lists, mismatched rights
Input: 1 2 3))
Output:

Test case 7 : two lists, mismatched rights
Input: (1) (2) 3))
Output:

Test case 8 : two lists, mixed mismatched lefts
Input: ((1 (2) ( (3)
Output:

Test case 9 : two lists, mixed mismatched rights
Input: (1) ) (2) 3))
Output:

Here's a slightly more complicated grammar that operates on []
and ()
pairs. I think the solution is going to get exponentially worse as you add pairs, but hey, it's fun! You may also be hitting the limitation of what you can do with grammar-driven AST building.
start : root+ EOF -> ^(LIST root+ )
;
root : expr
| LPARAM
| RPARAM
| LSQB
| RSQB
;
expr : plist
| slist
| atom
;
plist : LPARAM pexpr* RPARAM -> ^(LIST pexpr*)
;
pexpr : slist
| atom
| LSQB
| RSQB
;
slist : LSQB sexpr* RSQB -> ^(LIST sexpr*)
;
sexpr : plist
| atom
| LPARAM
| RPARAM
;
atom : INT;
INT : ('0'..'9')+;
LPARAM : '(';
RPARAM : ')';
LSQB : '[';
RSQB : ']';
来源:https://stackoverflow.com/questions/13980501/parse-parenthesis-as-atoms-antlr