问题
I am working with ANTLR4 and in the process of writing grammar to handle single and double quoted strings. I am trying to use Lexer modes to scope the strings but that is not working out for me, my grammar is listed below. Is this the right way or how can I properly parse these as tokens instead of parser rules with context. Any insight?
An example:
'single quote that contain "a double quote 'that has another single quote'"'
Lexer Grammar
lexer grammar StringLexer;
fragment SQUOTE: '\'';
fragment QUOTE: '"';
SQSTR_START: SQUOTE -> pushMode(SQSTR_MODE);
DQSTR_START: QUOTE -> pushMode(DQSTR_MODE);
CONTENTS: ~["\']+;
mode SQSTR_MODE;
SQSTR_END: (CONTENTS | DQSTR_START)+ SQUOTE -> popMode;
mode DQSTR_MODE;
DQSTR_END:(CONTENTS | SQSTR_START)+ QUOTE -> popMode;
Parser
parser grammar StringParser;
options { tokenVocab=StringLexer; }
start:
dqstr | sqstr
;
dqstr:
DQSTR_START DQSTR_END
;
sqstr:
SQSTR_START SQSTR_END
;
ADDENDUM Thanks @Lucas Trzesniewski for an answer.
This is part of grammar I am writing to parse shell-like language, I could have multiple lines of script where they would have SQSTR and DQSTR. With the lexer rules provided in the answer it would lump multiple lines of script together.
Happy case example (That get parsed correctly using the answer):
cmd 'single quote string'
cmd2 "double quote"
cmd3 'another single quote'
This get recognized as three commands and three strings (single and double)
Unparsed example: On the other hand - note the quote in the single quote strings:
cmd 'single "quote string'
cmd2 "double quote"
cmd3 'another "single quote'
In this case it would incorrectly detect all of them as a single string token of type SQSTR.
Any ideas how to address this problem?
回答1:
If you want to parse your example string as a single token, you don't necessarily have to use lexer modes, you can use mutually-recursive lexer rules instead:
SQSTR : '\'' (~['"] | DQSTR)* '\'';
DQSTR : '"' (~['"] | SQSTR)* '"';
Then, in the parser use something like:
str : SQSTR | DQSTR;
回答2:
Way too complicated, what you have in mind. Where did you see such a solution before? (Almost) all grammars in the grammar repository on github which have such rules use a simple and nicely working approach, where you have an introducer, content and terminator, all in one rule, e.g.:
SQSTRING: '\'' .*? '\'';
DQSTRING: '"' .*? '"';
Similarly for all other elements with that kind of structure (single quoted string, back tick quoted string, multiline comment etc.).
来源:https://stackoverflow.com/questions/39938926/handling-scope-for-single-and-double-quote-strings-in-antlr4