ANTLR What is simpliest way to realize python like indent-depending grammar?

后端 未结 4 1010
自闭症患者
自闭症患者 2020-12-01 06:24

I am trying realize python like indent-depending grammar.

Source example:

ABC QWE
  CDE EFG
  EFG CDE
    ABC 
  QWE ZXC

As i see,

4条回答
  •  天涯浪人
    2020-12-01 07:29

    There is a relatively simple way to do this ANTLR, which I wrote as an experiment: DentLexer.g4. This solution is different from the others mentioned on this page that were written by Kiers and Shavit. It integrates with the runtime solely via an override of the Lexer's nextToken() method. It does its work by examining tokens: (1) a NEWLINE token triggers the start of a "keep track of indentation" phase; (2) whitespace and comments, both set to channel HIDDEN, are counted and ignored, respectively, during that phase; and, (3) any non-HIDDEN token ends the phase. Thus controlling the indentation logic is a simple matter of setting a token's channel.

    Both of the solutions mentioned on this page require a NEWLINE token to also grab all the subsequent whitespace, but in doing so can't handle multi-line comments interrupting that whitespace. Dent, instead, keeps NEWLINE and whitespace tokens separate and can handle multi-line comments.

    Your grammar would be set up something like below. Note that the NEWLINE and WS lexer rules have actions that control the pendingDent state and keep track of indentation level with the indentCount variable.

    grammar MyGrammar;
    
    tokens { INDENT, DEDENT }
    
    @lexer::members {
        // override of nextToken(), see Dent.g4 grammar on github
        // https://github.com/wevrem/wry/blob/master/grammars/Dent.g4
    }
    
    script : ( NEWLINE | statement )* EOF ;
    
    statement
        :   simpleStatement
        |   blockStatements
        ;
    
    simpleStatement : LEGIT+ NEWLINE ;
    
    blockStatements : LEGIT+ NEWLINE INDENT statement+ DEDENT ;
    
    NEWLINE : ( '\r'? '\n' | '\r' ) {
        if (pendingDent) { setChannel(HIDDEN); }
        pendingDent = true;
        indentCount = 0;
        initialIndentToken = null;
    } ;
    
    WS : [ \t]+ {
        setChannel(HIDDEN);
        if (pendingDent) { indentCount += getText().length(); }
    } ;
    
    BlockComment : '/*' ( BlockComment | . )*? '*/' -> channel(HIDDEN) ;   // allow nesting comments
    LineComment : '//' ~[\r\n]* -> channel(HIDDEN) ;
    
    LEGIT : ~[ \t\r\n]+ ~[\r\n]*;   // Replace with your language-specific rules...
    

提交回复
热议问题