I have an antlr4 grammar designed for an a domain specific language that is embedded into a text template.
There are two modes:
- Text (whitespace should be preserved)
- Code (whitespace should be ignored)
Sample grammar part:
template
: '{' templateBody '}'
;
templateBody
: templateChunk*
;
templateChunk
: code # codeChunk // dsl code, ignore whitespace
| text # textChunk // any text, preserve whitespace
;
The rule for code may contain a nested reference to the template rule. So the parser must support nesting whitespace/non-whitespace sections.
Maybe lexer modes can help - with some drawbacks:
- the code sections must be parsed in another compiler pass
- I doubt that nested sections could be mapped correctly
Yet the most promising approach seems to be the manipulation of hidden channels.
My question: Is there a best practice to fill these requirements? Is there an example grammar, that has already solved similar problems?
Appendix:
The rest of the grammar could look as following:
code
: '@' function
;
function
: Identifier '(' argument ')'
;
argument
: function
| template
;
text
: Whitespace+
| Identifier
| .+
;
Identifier
: LETTER (LETTER|DIGIT)*
;
Whitespace
: [ \t\n\r] -> channel(HIDDEN)
;
fragment LETTER
: [a-zA-Z]
;
fragment DIGIT
: [0-9]
;
In this example code has a dummy implementation pointing out that it can contain nested code/template sections. Actually code should support
- multiple arguments
- primitive type Arguments (ints, strings, ...)
- maps and lists
- function evaluation
- ...
This is how I solved the problem at the end:
The idea is to enable/disable whitespace in a parser rule:
templateBody : {enableWs();} templateChunk* {disableWs();};
So we will have to define enableWs and disableWs in our parser base class:
public void enableWs() {
if (_input instanceof MultiChannelTokenStream) {
((MultiChannelTokenStream) _input).enable(HIDDEN);
}
}
public void disableWs() {
if (_input instanceof MultiChannelTokenStream) {
((MultiChannelTokenStream) _input).disable(HIDDEN);
}
}
Now what is this MultiChannelTokenStream?
- Antlr4 defines a
CommonTokenStreamwhich is a token stream reading only from one channel. MultiChannelTokenStreamis a token stream reading from the enabled channels. For implementation I took the source code of CommonTokenStream and replaced each reference to thechannelbychannels(equality comparison gets contains comparison)
An example implementation with the grammar above could be found at antlr4multichannel
来源:https://stackoverflow.com/questions/29060496/allow-whitespace-sections-antlr4