ANTLRv4: non-greedy rules

旧时模样 提交于 2019-12-30 07:12:17

问题


I'm reading the definite ANTLR4 reference and have a question regarding one of the examples (p. 76):

STRING: '"' (ESC|.)*? '"';
fragment 
ESC: '\\"' | '\\\\' ;

The rule matches a typical C++ string - a char sequence included in "", which can contain \" too.

In my expectation, the rule STRING should match the smallest string possible because of the non-greedy construct. So if it sees a \" it would map \ to . and " to " at the end of the rule, since this would result in the smallest string possible. Instead of this, a \" is mapped to ESC. I have an understanding problem, since it is not what I expected.

What exactly happens here? Is it like this, that a separated DFA matches (ESC|.) first, and another DFA matches STRING using the already matched string of the (ESC|.) construct? I have to admit I haven't read the book to the end.


回答1:


ANTLR 4 lexers normally operate with longest-match-wins behavior, without any regard for the order in which alternatives appear in the grammar. If two lexer rules match the same longest input sequence, only then is the relative order of those rules compared to determine how the token type is assigned.

The behavior within a rule changes as soon as the lexer reaches a non-greedy optional or closure. From that moment forward to the end of the rule, all alternatives within that rule will be treated as ordered, and the path with the lowest alternative wins. This seemingly strange behavior is actually responsible for the non-greedy handling due to the way we order alternatives in the underlying ATN representation. When the lexer is in this mode and reaches the block (ESC|.), the ordering constraint requires it use ESC if possible.



来源:https://stackoverflow.com/questions/18787242/antlrv4-non-greedy-rules

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!