Parsing single line comments

泪湿孤枕 提交于 2019-12-11 00:54:30

问题


I am trying to write a grammar for parsing single line comments. Comments starts with '--' can appear anywhere in the file.

My basic grammar looks like below.

Grammar (aa.g4):

grammar aa;

statement
    :   commentStatement* ifStatement
    |   commentStatement* returnStatement
    ;
ifStatement
    :   'if' '(' expression ')'
        returnStatement+
    ;

returnStatement  :   'return' expression ';' ;
commentStatement :   '--' (.+?) '\\n'? ;
expression       :   IDENTIFIER ;

IDENTIFIER       :   [a-z]([A-Za-z0-9\-\_])* ;
NEWLINE          :   '\r'? '\n'    -> skip ;
WS               :   [ \t\r\f\n]+ -> skip ;

Test class:

public class aaTest {
    static class aaListener extends aaBaseListener {
        public void enterCommentStatement(CommentStatementContext ctx) {
            System.out.println(ctx.getText());
        }
    }

    public static void main(String[] args) throws Exception {
        InputStream is = new FileInputStream("aa.txt");
        CharStream stream = new ANTLRInputStream(is);
        aaLexer lexer = new aaLexer(stream);
        TokenStream tokenStream = new CommonTokenStream(lexer);
        aaParser parser = new aaParser(tokenStream);
        ParseTree aParseTree = parser.statement();
        ParseTreeWalker aWalker = new ParseTreeWalker();
        aWalker.walk(new aaListener(), aParseTree);;
    }
}

Input:

--comment1
-- if comment
if (x) --mid if comment
  --end comment
return result;

Output:

--comment1a
--ifcommentif(x)     <<< error output
--midifcomment
--endcomment

Queries:

  1. What is the issue in parsing error output above. I need only "-- if comment" to be printed.
  2. How do I get and output actual comment with spaces.

回答1:


First, you should define your line comment rule as you truly mean it. The non-greedy operator is not performing the way you intend.

LineComment
  : '--' ~[\r\n]* -> channel(HIDDEN)
  ;

Second, if you want the token stream to contain information about whitespace and newline characters, you should move them to the hidden channel instead of using the skip command. The skip command completely drops the token, making it appear as though the text was never even in the input at all.

NEWLINE
  : '\r'? '\n' -> channel(HIDDEN)
  ;

WS
  : [ \t\f]+ -> channel(HIDDEN)
  ;

Comments will not appear in the parse tree, and you won't use LineComment in any of your parser rules. To get information about these tokens before or after another token in the parse tree, you can examine the tokens around a specific index directly (using TokenStream.get(int)) or with a utility method like BufferedTokenStream.getHiddenTokensToRight or BufferedTokenStream.getHiddenTokensToLeft.



来源:https://stackoverflow.com/questions/23976617/parsing-single-line-comments

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!