问题
Is there a way in ANTLR to mark certain tokens as having canonical output?
For example, given the grammar (excerpt)
words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);
words
will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.
When I call TokenStream#getText(Context)
, it'll return the tokens' actual text concatenated together.
Is there a way to "canonicalize" this output such that no matter what the input, all FOO
tokens render as "Foo", BAR
tokens render as "Bar", and BAZ
tokens render as "Baz" (for example)?
Given any of the inputs above, I'd like to have the output "Foo Bar Baz".
回答1:
Any of the following options would work:
Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.
Create your own
Token
class that knows to return the canonical form of certain tokens, and create aTokenFactory
implementation that creates tokens of that type. Then use thesetTokenFactory
method to cause your lexer to produce those tokens.Create your own
TokenStream
implementation that overrides the default behavior.Explicitly specify the text in an action that runs prior to the creation of tokens:
FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };
Other options are likely available as well.
来源:https://stackoverflow.com/questions/25812615/canonicalizing-token-text-in-antlr