Parsing fortran-style .op. operators

南楼画角 提交于 2019-12-01 13:10:57

问题


I'm trying to write an ANTLR4 grammar for a fortran-inspired DSL. I'm having difficulty with the 'ole classic ".op." operators:

if (1.and.1) then

where both "1"s should be intepreted as integer. I looked at the OpenFortranParser for insight, but I can't make sense out of it.

Initially, I had suitable definitions for INTEGER and REAL in my lexer. Consequently, the first "1" above always parsed as a REAL, no matter what I tried. I tried moving things into the parser, and got it to the point where I could reliably recognize the ".and." along with numbers around it as appropriately INTEGER or REAL.

if (1.and.1)   # INT/INT
if (1..and..1) # REAL/REAL

...etc...

I of course want to recognize variable-names in such statements:

if (a.and.b)

and have an appropriate rule for ID. In the small grammar below, however, any literals in quotes (ex, 'and', 'if', all the single-character numerical suffixes) are not accepted as an ID, and I get an error; any other ID-conforming string is accepted:

if (a.and.b)  # errs, as 'b' is valid INTEGER suffix
if (a.and.c)  # OK

Any insights into this behavior, or better suggestions on how to parse the .op. operators in fortran would be greatly appreciated -- Thanks!

grammar Foo;

start  : ('if' expr | ID)+ ;

DOT : '.' ;

DIGITS: [0-9]+;

ID : [a-zA-Z0-9][a-zA-Z0-9_]* ;

andOp : DOT 'and' DOT ;

SIGN : [+-];

expr     
    : ID
    | expr andOp expr
    | numeric
    | '(' expr ')'
    ;

integer : DIGITS ('q'|'Q'|'l'|'L'|'h'|'H'|'b'|'B'|'i'|'I')? ;

real    
    : DIGITS DOT DIGITS? (('e'|'E') SIGN? DIGITS)? ('d' | 'D')?
    |        DOT DIGITS  (('e'|'E') SIGN? DIGITS)? ('d' | 'D')?
    ;

numeric : integer | real;

EOLN  : '\r'? '\n' -> skip;

WS    :  [ \t]+ -> skip;   

回答1:


To disambiguate DOT, add a lexer rule with a predicate just before the DOT rule.

DIT : DOT { isDIT() }? ;
DOT : '.' ;

Change the 'andOp'

andOp : DIT 'and' DIT ;

Then add a predicate method

@lexer::members {

public boolean isDIT() {
    int offset = _tokenStartCharIndex;
    String r = _input.getText(Interval.of(offset-4, offset));
    String s = _input.getText(Interval.of(offset, offset+4));
    if (".and.".equals(s) || ".and.".equals(r)) {
        return true;
    }
    return false;
}

}

But, that is not really the source of your current problem. The integer parser rule defines lexer constants effectively outside of the lexer, which is why 'b' is not matched to an ID.

Change it to

integer : INT ;

INT:  DIGITS ('q'|'Q'|'l'|'L'|'h'|'H'|'b'|'B'|'i'|'I')? ;

and the lexer will figure out the rest.



来源:https://stackoverflow.com/questions/23141515/parsing-fortran-style-op-operators

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!