Trying to use keywords as identifiers in ANTLR4; not working

问题

I'm trying to get some sql keywords to be accepted as identifiers, when used as identifiers. The Antlr book p210 suggests this trick:

id : 'if' | 'call' | 'then' | ID ;

I've got something similar but it's not working and I assume it's a misunderstanding on my part. regular_ident is the parse rule for an identifier thus:

regular_ident :  // (1)
        KEYWORD_AS_IDENT
        |
        REGULAR_IDENT
    ;

REGULAR_IDENT is the main lex rule for idents. It's roughly this (simplified here), and it works:

REGULAR_IDENT :
        [a-zA-Z]  ( [a-zA-Z0-9] * )
    ;

KEYWORD_AS_IDENT is the list of special words, here's an extract:

KEYWORD_AS_IDENT :  // (2)
[...snip...]
  | FILESTREAM
  | SPARSE
  | NO
  | ACTION
  | PERSISTED
  | FILETABLE_DIRECTORY
  | FILETABLE_COLLATE_FILENAME
  | FILETABLE_PRIMARY_KEY_CONSTRAINT_NAME
  | FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME
  | FILETABLE_FULLPATH_UNIQUE_CONSTRAINT_NAME
  | COLUMN_SET
  | ALL_SPARSE_COLUMNS
 ;

where components are defined elsewhere:

SPARSE : 'sparse' ;
NO     : 'no'
(etc)

If I give it fetch aaa as input ('aaa' is not a keyword), it parses:

but if I give it fetch sparse it fails - 'sparse' is a keyword:

perhaps I'm being dumb but I can't see why, as SPARSE is a member of KEYWORD_AS_IDENT. If I cut & paste some of (2) into (1) to get this:

regular_ident :
    FILESTREAM
  | SPARSE
  | NO
  | ACTION
  | PERSISTED
  | FILETABLE_DIRECTORY
        |
    REGULAR_IDENT
    ;

it suddenly is ok with fetch sparse as it now treats 'sparse' as an regular_ident:

but why does (1) not work? I can fix it trivially by inlining all of KEYWORD_AS_IDENT but I need to know what I'm missing.

All suggestions appreciated.

回答1:

I'm using your second approach in my own grammars (e.g. MySQL.g) as this was the only way to get this reliably working. This is however still ANTLR3 there. And I used kinda hack to change the token type recognized by the rule keyword so that it returns IDENTIFIER, instead of the individual keyword tokens.

回答2:

Reply from Eric Vergnaud from google group antlr-discussion:

LAST is declared before KEYWORD_AS_IDENT so when the lexer encounters 'last', it generates a LAST token, not a KEYWORD_AS_IDENT. Your start rule does not accept LAST token as a valid input, hence the shouting. Your grammar will actually NEVER produce a KEYWORD_AS_IDENT token, because another valid token will match before. It seems you are trying to get the lexer do the job of the parser i.e. handle multiple semantic alternatives, but at the time the token reaches the parser it's too late... Have you tried making KEYWORD_AS_IDENT a parser rule (lowercase) rather than a lexer rule?

So my understanding of the lexer was faulty, and he's correct that I was trying to get it to do the parser's job.

来源：https://stackoverflow.com/questions/35304065/trying-to-use-keywords-as-identifiers-in-antlr4-not-working

标签

antlr4