Parse Parenthesis as atoms ANTLR

帅比萌擦擦* 提交于 2019-12-24 10:19:59

问题


I'm trying to match balanced parentheses such that, a PARAMS tree is created if a match is made, else the LPARAM and RPARAM tokens are simply added as atoms to the tree...

tokens
{
    LIST;    
    PARAMS;
}

start   : list -> ^(LIST list);

list    : (expr|atom)+;

expr : LPARAM list? RPARAM -> ^(PARAMS list?);

atom :  INT | LPARAM | RPARAM;

INT :   '0'..'9'+;
LPARAM  :   '(';
RPARAM  :   ')';

At the moment, it will never create a PARAMS tree, because in the rule expr it will always see the end RPARAM as an atom, rather than the the closing token for that rule.

So at the moment, something like 1 2 3 (4) 5 is added to a LIST tree as a flat list of tokens, rather than the required grouping.

I've handled adding tokens as atoms to a tree before, but they never were able to start another rule, as LPARAM does here.

Do I need some sort of syntatic/semantic predicate here?


回答1:


Here is a simple approach that comes with a couple of constraints. I think these conform to the expected behavior that you mentioned in the comments.

  • An unmatched LPARAM never appears inside a child list
  • An unmatched RPARAM never appears inside a child list

Grammar:

start   : root+ EOF -> ^(LIST root+ );

root    : expr
        | LPARAM
        | RPARAM
        ;

expr    : list
        | atom
        ;           

list    : LPARAM expr+ RPARAM -> ^(LIST expr+)
        ;

atom    : INT
        ;

Rule root matches mismatched LPARAMs and RPARAMs. Rules list and atom only care about themselves.

This solution is relatively fragile because rule root requires expr to be listed before LPARAM and RPARAM. Even so, maybe this is enough to solve your problem.

Test case 1 : no lists

Input: 1 2 3

Output:

Test case 2 : one list

Input: 1 (2) 3

Output:

Test case 3 : two lists

Input: (1) 2 (3)

Output:

Test case 4 : no lists, mismatched lefts

Input: ((1 2 3

Output:

Test case 5 : two lists, mismatched lefts

Input: ((1 (2) (3)

Output:

Test case 6 : no lists, mismatched rights

Input: 1 2 3))

Output:

Test case 7 : two lists, mismatched rights

Input: (1) (2) 3))

Output:

Test case 8 : two lists, mixed mismatched lefts

Input: ((1 (2) ( (3)

Output:

Test case 9 : two lists, mixed mismatched rights

Input: (1) ) (2) 3))

Output:


Here's a slightly more complicated grammar that operates on [] and () pairs. I think the solution is going to get exponentially worse as you add pairs, but hey, it's fun! You may also be hitting the limitation of what you can do with grammar-driven AST building.

start   : root+ EOF -> ^(LIST root+ )
        ;

root    : expr
        | LPARAM
        | RPARAM
        | LSQB
        | RSQB
        ;       
expr    : plist
        | slist
        | atom
        ;           

plist   : LPARAM pexpr* RPARAM -> ^(LIST pexpr*)
        ;

pexpr   : slist
        | atom
        | LSQB
        | RSQB
        ;       

slist   : LSQB sexpr* RSQB -> ^(LIST sexpr*)
        ;

sexpr   : plist
        | atom
        | LPARAM
        | RPARAM
        ;               

atom    : INT;

INT     : ('0'..'9')+;
LPARAM  : '(';
RPARAM  : ')';
LSQB    : '[';
RSQB    : ']';


来源:https://stackoverflow.com/questions/13980501/parse-parenthesis-as-atoms-antlr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!