ANTLR4 negative lookahead in lexer

孤街浪徒 提交于 2019-11-28 06:26:32

问题


I am trying to define lexer rules for PostgreSQL SQL.

The problem is with the operator definition and the line comments conflicting with each other.

for example @--- is an operator token @- followed by the -- comment and not an operator token @---

In grako it would be possible to define the negative lookahead for the - fragment like:

OP_MINUS: '-' ! ( '-' ) .

In ANTLR4 I could not find any way to rollback already consumed fragment.

Any ideas?

Here the original definition what the PostgreSQL operator can be:

The operator name is a sequence of up to NAMEDATALEN-1
(63 by default) characters from the following list:

 + - * / < > = ~ ! @ # % ^ & | ` ?

There are a few restrictions on your choice of name:
-- and /* cannot appear anywhere in an operator name,
since they will be taken as the start of a comment.

A multicharacter operator name cannot end in + or -,
unless the name also contains at least one of these
characters:

~ ! @ # % ^ & | ` ?

For example, @- is an allowed operator name, but *- is not.
This restriction allows PostgreSQL to parse SQL-compliant
commands without requiring spaces between tokens.

回答1:


You can use a semantic predicate in your lexer rules to perform lookahead (or behind) without consuming characters. For example, the following covers several rules for an operator.

OPERATOR
  : ( [+*<>=~!@#%^&|`?]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )+
  ;

However, the above rule does not address the restrictions on including a + or - at the end of an operator. To handle that in the easiest way possible, I would probably separate the two cases into separate rules.

// this rule does not allow + or - at the end of a rule
OPERATOR
  : ( [*<>=~!@#%^&|`?]
    | ( '+'
      | '-' {_input.LA(1) != '-'}?
      )+
      [*<>=~!@#%^&|`?]
    | '/' {_input.LA(1) != '*'}?
    )+
  ;

// this rule allows + or - at the end of a rule and sets the type to OPERATOR
// it requires a character from the special subset to appear
OPERATOR2
  : ( [*<>=+]
    | '-' {_input.LA(1) != '-'}?
    | '/' {_input.LA(1) != '*'}?
    )*
    [~!@#%^&|`?]
    OPERATOR?
    ( '+'
    | '-' {_input.LA(1) != '-'}?
    )+
    -> type(OPERATOR)
  ;


来源:https://stackoverflow.com/questions/24194110/antlr4-negative-lookahead-in-lexer

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!