Antlr Lexer exclude a certain pattern

孤者浪人 提交于 2019-12-12 02:50:15

问题


In Antlr Lexer, How can I achieve parsing a token like this:

A word that contains any non-space letter but not '.{' inside it. Best I can come up with is using a semantics predicate.

WORD: WL+   {!getText().contains(".{")};
WL: ~[ \n\r\t];

I'm a bit worried to use semantics predicate though cause WORD here will be lexed millions of times I would think to put a semantics predicate will hit the performance.

This is coming from the requirement that I need to parse something like:

TOKEN_ONE.{TOKEN_TWO}

while TOKEN_ONE can include . and { in its letter.

I'm using Antlr 4.


回答1:


You need to limit your predicate evaluation to the case immediately following a . in the input.

WORD
  : ( ~[. \t\r\n]
    | '.' {_input.LA(1)!='{'}?
    )+
  ;



回答2:


How about rephrasing your question to the equivalent "A word contains any character except whitespace or dot or left brace-bracket."

Then the lexer rule is just:

 WORD:  ~[ \n\r\t.{]*


来源:https://stackoverflow.com/questions/19224181/antlr-lexer-exclude-a-certain-pattern

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!