Parse Parenthesis as atoms ANTLR

问题

I'm trying to match balanced parentheses such that, a PARAMS tree is created if a match is made, else the LPARAM and RPARAM tokens are simply added as atoms to the tree...

tokens
{
    LIST;    
    PARAMS;
}

start   : list -> ^(LIST list);

list    : (expr|atom)+;

expr : LPARAM list? RPARAM -> ^(PARAMS list?);

atom :  INT | LPARAM | RPARAM;

INT :   '0'..'9'+;
LPARAM  :   '(';
RPARAM  :   ')';

At the moment, it will never create a PARAMS tree, because in the rule expr it will always see the end RPARAM as an atom, rather than the the closing token for that rule.

So at the moment, something like 1 2 3 (4) 5 is added to a LIST tree as a flat list of tokens, rather than the required grouping.

I've handled adding tokens as atoms to a tree before, but they never were able to start another rule, as LPARAM does here.

Do I need some sort of syntatic/semantic predicate here?

回答1:

Here is a simple approach that comes with a couple of constraints. I think these conform to the expected behavior that you mentioned in the comments.

An unmatched LPARAM never appears inside a child list
An unmatched RPARAM never appears inside a child list

Grammar:

start   : root+ EOF -> ^(LIST root+ );

root    : expr
        | LPARAM
        | RPARAM
        ;

expr    : list
        | atom
        ;           

list    : LPARAM expr+ RPARAM -> ^(LIST expr+)
        ;

atom    : INT
        ;

Rule root matches mismatched LPARAMs and RPARAMs. Rules list and atom only care about themselves.

This solution is relatively fragile because rule root requires expr to be listed before LPARAM and RPARAM. Even so, maybe this is enough to solve your problem.

Test case 1 : no lists

Input: 1 2 3

Output:

Test case 2 : one list

Input: 1 (2) 3

Output:

Test case 3 : two lists

Input: (1) 2 (3)

Output:

Test case 4 : no lists, mismatched lefts

Input: ((1 2 3

Output:

Test case 5 : two lists, mismatched lefts

Input: ((1 (2) (3)

Output:

Test case 6 : no lists, mismatched rights

Input: 1 2 3))

Output:

Test case 7 : two lists, mismatched rights

Input: (1) (2) 3))

Output:

Test case 8 : two lists, mixed mismatched lefts

Input: ((1 (2) ( (3)

Output:

Test case 9 : two lists, mixed mismatched rights

Input: (1) ) (2) 3))

Output:

Here's a slightly more complicated grammar that operates on [] and () pairs. I think the solution is going to get exponentially worse as you add pairs, but hey, it's fun! You may also be hitting the limitation of what you can do with grammar-driven AST building.

start   : root+ EOF -> ^(LIST root+ )
        ;

root    : expr
        | LPARAM
        | RPARAM
        | LSQB
        | RSQB
        ;       
expr    : plist
        | slist
        | atom
        ;           

plist   : LPARAM pexpr* RPARAM -> ^(LIST pexpr*)
        ;

pexpr   : slist
        | atom
        | LSQB
        | RSQB
        ;       

slist   : LSQB sexpr* RSQB -> ^(LIST sexpr*)
        ;

sexpr   : plist
        | atom
        | LPARAM
        | RPARAM
        ;               

atom    : INT;

INT     : ('0'..'9')+;
LPARAM  : '(';
RPARAM  : ')';
LSQB    : '[';
RSQB    : ']';

来源：https://stackoverflow.com/questions/13980501/parse-parenthesis-as-atoms-antlr

标签

parsing

antlr

grammar

expression