Canonicalizing token text in ANTLR

那年仲夏 提交于 2019-12-24 17:00:11

问题


Is there a way in ANTLR to mark certain tokens as having canonical output?

For example, given the grammar (excerpt)

words : FOO BAR BAZ
FOO : [Ff] [Oo] [Oo]
BAR : [Bb] [Aa] [Rr]
BAZ : [Bb] [Aa] [Zz]
SP : [ ] -> channel(HIDDEN);

words will match "FOO BAR BAZ", "foo bar baz", "Foo bAr baZ", etc.

When I call TokenStream#getText(Context), it'll return the tokens' actual text concatenated together.

Is there a way to "canonicalize" this output such that no matter what the input, all FOO tokens render as "Foo", BAR tokens render as "Bar", and BAZ tokens render as "Baz" (for example)?

Given any of the inputs above, I'd like to have the output "Foo Bar Baz".


回答1:


Any of the following options would work:

  1. Implement your own method to obtain the text for a parse tree or range of tokens, and place the handling for certain known token types there.

  2. Create your own Token class that knows to return the canonical form of certain tokens, and create a TokenFactory implementation that creates tokens of that type. Then use the setTokenFactory method to cause your lexer to produce those tokens.

  3. Create your own TokenStream implementation that overrides the default behavior.

  4. Explicitly specify the text in an action that runs prior to the creation of tokens:

    FOO : [Ff] [Oo] [Oo] { _text = "Foo"; };
    

Other options are likely available as well.



来源:https://stackoverflow.com/questions/25812615/canonicalizing-token-text-in-antlr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!