Antlr4: single quote rule fails when there are escape chars plus carriage return, new line

孤街浪徒 提交于 2019-12-11 15:52:55

问题


I have a grammar as such:

grammar Testquote;
program : (Line ';')+ ;
Line: L_S_STRING ;
L_S_STRING  : '\'' (('\'' '\'') | ('\\' '\'') | ~('\''))* '\''; // Single quoted string literal
L_WS        : L_BLANK+ -> skip ;   // Whitespace
fragment L_BLANK : (' ' | '\t' | '\r' | '\n') ;

This grammar--and the L_S_STRING in particular--seems working fine with vanilla inputs like:

'ab';
'cd';

However, it fails with this input:

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
'cd';

Yet works when I changed the first line to either 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z'''; or 'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\' ';

I sorta can see why the parser may choose this failed route. But is there some way I can tell it to choose differently?


回答1:


According to ANTLR4 docs, both lexer and parser rules are greedy, thus matching as much input as they can. In your case:

'yyyy-MM-dd\\'T\\'HH:mm:ss\\'Z\\'';
                               ^^^
'cd';

Your grammar is somewhat ambiguous - the characters I've highlighted can be interpreted as \' ' or as \ ''. See how it works.

Without 'cd', lexer matches a string because it's a valid string for your grammar, highlighted characters are matched as \' '. But since lexer is greedy, it will use the aforementioned ambiguity to match unwanted input at first possibility, such as adding another unescaped ' somewhere later.

This ambiguity is caused by possibility of backslash being either normal character or escape character. The common solution for removing such ambiguity is a rule for escaping the backslash itself: \\, also you need to not match it as a normal character.

Alternatively, you may want to deal with ambiguity in a different way. If you want to prioritize \' over '', you should write:

L_S_STRING  : '\'' ( ('\'\'') | ('\\'+ ~'\\') | ~('\'' | '\\') )* '\'' ;

It will work for your input.

By the way, you can shorten your code for L_WS:

L_WS : [ \t\n\r]+ -> skip ;


来源:https://stackoverflow.com/questions/53465057/antlr4-single-quote-rule-fails-when-there-are-escape-chars-plus-carriage-return

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!