问题
I'm trying to get some sql keywords to be accepted as identifiers, when used as identifiers. The Antlr book p210 suggests this trick:
id : 'if' | 'call' | 'then' | ID ;
I've got something similar but it's not working and I assume it's a misunderstanding on my part.
regular_ident
is the parse rule for an identifier thus:
regular_ident : // (1)
KEYWORD_AS_IDENT
|
REGULAR_IDENT
;
REGULAR_IDENT
is the main lex rule for idents. It's roughly this (simplified here), and it works:
REGULAR_IDENT :
[a-zA-Z] ( [a-zA-Z0-9] * )
;
KEYWORD_AS_IDENT
is the list of special words, here's an extract:
KEYWORD_AS_IDENT : // (2)
[...snip...]
| FILESTREAM
| SPARSE
| NO
| ACTION
| PERSISTED
| FILETABLE_DIRECTORY
| FILETABLE_COLLATE_FILENAME
| FILETABLE_PRIMARY_KEY_CONSTRAINT_NAME
| FILETABLE_STREAMID_UNIQUE_CONSTRAINT_NAME
| FILETABLE_FULLPATH_UNIQUE_CONSTRAINT_NAME
| COLUMN_SET
| ALL_SPARSE_COLUMNS
;
where components are defined elsewhere:
SPARSE : 'sparse' ;
NO : 'no'
(etc)
If I give it fetch aaa
as input ('aaa' is not a keyword), it parses:
but if I give it fetch sparse
it fails - 'sparse' is a keyword:
perhaps I'm being dumb but I can't see why, as SPARSE
is a member of KEYWORD_AS_IDENT
.
If I cut & paste some of (2) into (1) to get this:
regular_ident :
FILESTREAM
| SPARSE
| NO
| ACTION
| PERSISTED
| FILETABLE_DIRECTORY
|
REGULAR_IDENT
;
it suddenly is ok with fetch sparse
as it now treats 'sparse' as an regular_ident:
but why does (1) not work?
I can fix it trivially by inlining all of KEYWORD_AS_IDENT
but I need to know what I'm missing.
All suggestions appreciated.
回答1:
I'm using your second approach in my own grammars (e.g. MySQL.g) as this was the only way to get this reliably working. This is however still ANTLR3 there. And I used kinda hack to change the token type recognized by the rule keyword
so that it returns IDENTIFIER, instead of the individual keyword tokens.
回答2:
Reply from Eric Vergnaud from google group antlr-discussion:
LAST
is declared beforeKEYWORD_AS_IDENT
so when the lexer encounters 'last', it generates aLAST
token, not aKEYWORD_AS_IDENT
. Your start rule does not acceptLAST
token as a valid input, hence the shouting. Your grammar will actually NEVER produce aKEYWORD_AS_IDENT
token, because another valid token will match before. It seems you are trying to get the lexer do the job of the parser i.e. handle multiple semantic alternatives, but at the time the token reaches the parser it's too late... Have you tried making KEYWORD_AS_IDENT a parser rule (lowercase) rather than a lexer rule?
So my understanding of the lexer was faulty, and he's correct that I was trying to get it to do the parser's job.
来源:https://stackoverflow.com/questions/35304065/trying-to-use-keywords-as-identifiers-in-antlr4-not-working