问题
I am using Antlr4 to parse C code. I want to parse multiline #defines alongwith C.g4 provided in C.g4
But the grammar mentioned in the link above does not support preprocessor directives, so I have added the following new rules to support preprocessing.
Link to my previous question
Whitespace
: [ \t]+
-> channel(HIDDEN)
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> channel(HIDDEN)
;
BlockComment
: '/*' .*? '*/'
;
LineComment
: '//' ~[\r\n]*
;
IncludeBlock
: '#' Whitespace? 'include' ~[\r\n]*
;
DefineStart
: '#' Whitespace? 'define'
;
DefineBlock
: DefineStart ~[\r\n]*
;
MultiDefine
: DefineStart MultiDefineBody
;
MultiDefineBody
: [\\] [\r\n]+ MultiDefineBody
| ~[\r\n]
;
preprocessorDeclaration
: includeDeclaration
| defineDeclaration
;
includeDeclaration
: IncludeBlock
;
defineDeclaration
: DefineBlock | MultiDefine
;
comment
: BlockComment
| LineComment
;
declaration
: declarationSpecifiers initDeclaratorList ';'
| declarationSpecifiers ';'
| staticAssertDeclaration
| preprocessorDeclaration
| comment
;
It works only for Single line pre-processor directives if MultiBlock rule is removed But for multiline #defines it is not working.
Any help will be appreciated
By Multiline #define I mean
#define MACRO(num, str) {\
printf("%d", num);\
printf(" is");\
printf(" %s number", str);\
printf("\n");\
}
Basically I need to find a grammar that can parse the above block
回答1:
I'm shamelessly copying part of my answer from here:
This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.
In your case the input sequence DefineStart \\\r\n
(where DefineStart
stands for an input-sequence corresponsing to the respective rule) will be matched by DefineBlock
because the \\
is being consumed by the ~[\r\n]*
construct.
You now have two possibilities: Either you tweak your current set of rules in order to circumvent this problem or (my sugestion) you simply use one rule for matching a define-statement (single and multiline).
Such a merged rule could look like this:
DefineBlock:
DefineStart (~[\\\r\n] | '\\\\' '\r'? '\n' | '\\'. )*
;
Note that this code is untested but it should read like this: Match DefineStart
and afterwards an arbitrary long character sequence matching the following pattern: The current character is either not \
, \r
or \n
, it is an escaped newline or a backslash followed by an arbitrary character.
This should allow for the wished newline-escaping.
来源:https://stackoverflow.com/questions/48320194/antlr-parsing-multiline-define-for-c-g4