Antlr - Parsing Multiline #define for C.g4

我是研究僧i 提交于 2019-12-24 07:37:40

问题


I am using Antlr4 to parse C code. I want to parse multiline #defines alongwith C.g4 provided in C.g4

But the grammar mentioned in the link above does not support preprocessor directives, so I have added the following new rules to support preprocessing.

Link to my previous question

Whitespace
    :   [ \t]+
        -> channel(HIDDEN)
    ;

Newline
    :   (   '\r' '\n'?
        |   '\n'
        )
        -> channel(HIDDEN)
    ;

BlockComment
    :   '/*' .*? '*/'
    ;

LineComment
    :   '//' ~[\r\n]*
    ;


IncludeBlock
     :   '#' Whitespace? 'include' ~[\r\n]*
     ;

DefineStart
    :     '#' Whitespace? 'define'
    ;

DefineBlock
     :   DefineStart ~[\r\n]*
     ;

    MultiDefine
    :   DefineStart MultiDefineBody
    ;

MultiDefineBody
    :   [\\] [\r\n]+ MultiDefineBody
    |   ~[\r\n]
    ;



preprocessorDeclaration
    :   includeDeclaration
    |   defineDeclaration
    ;

includeDeclaration
    :   IncludeBlock
    ;

defineDeclaration
    :   DefineBlock | MultiDefine
    ;

comment
    :   BlockComment
    |   LineComment
    ;

declaration
    :   declarationSpecifiers initDeclaratorList ';'
    |   declarationSpecifiers ';'
    |   staticAssertDeclaration
    |   preprocessorDeclaration
    |   comment
    ;

It works only for Single line pre-processor directives if MultiBlock rule is removed But for multiline #defines it is not working.

Any help will be appreciated

By Multiline #define I mean

#define MACRO(num, str) {\
            printf("%d", num);\
            printf(" is");\
            printf(" %s number", str);\
            printf("\n");\
           }

Basically I need to find a grammar that can parse the above block


回答1:


I'm shamelessly copying part of my answer from here:

This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.

In your case the input sequence DefineStart \\\r\n (where DefineStart stands for an input-sequence corresponsing to the respective rule) will be matched by DefineBlock because the \\ is being consumed by the ~[\r\n]* construct.

You now have two possibilities: Either you tweak your current set of rules in order to circumvent this problem or (my sugestion) you simply use one rule for matching a define-statement (single and multiline).

Such a merged rule could look like this:

DefineBlock:
    DefineStart (~[\\\r\n] | '\\\\' '\r'? '\n' | '\\'. )*
;

Note that this code is untested but it should read like this: Match DefineStart and afterwards an arbitrary long character sequence matching the following pattern: The current character is either not \, \r or \n, it is an escaped newline or a backslash followed by an arbitrary character. This should allow for the wished newline-escaping.



来源:https://stackoverflow.com/questions/48320194/antlr-parsing-multiline-define-for-c-g4

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!