问题
Is there a way in ANTLR to mark certain tokens as having canonical output?
For example, given the grammar (excerpt)
words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);
words will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.
When I call TokenStream#getText(Context), it'll return the tokens' actual text concatenated together.
Is there a way to "canonicalize" this output such that no matter what the input, all FOO tokens render as "Foo", BAR tokens render as "Bar", and BAZ tokens render as "Baz" (for example)?
Given any of the inputs above, I'd like to have the output "Foo Bar Baz".
回答1:
Any of the following options would work:
Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.
Create your own
Tokenclass that knows to return the canonical form of certain tokens, and create aTokenFactoryimplementation that creates tokens of that type. Then use thesetTokenFactorymethod to cause your lexer to produce those tokens.Create your own
TokenStreamimplementation that overrides the default behavior.Explicitly specify the text in an action that runs prior to the creation of tokens:
FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };
Other options are likely available as well.
来源:https://stackoverflow.com/questions/25812615/canonicalizing-token-text-in-antlr