Hello everyone,
This could be a follow-up on this question: Antlr rule priorities
I\'m trying to write an ANTLR grammar for t
Here's a quick demo how you could parse this reStructeredText. Note that it just handles a minor set of all available markup-syntax, and by adding more to it, you will affect the existing parser/lexer rules: so there is much, much more work to be done!
grammar RST;
options {
output=AST;
backtrack=true;
memoize=true;
}
tokens {
ROOT;
PARAGRAPH;
INDENTATION;
LINE;
WORD;
BOLD;
ITALIC;
INTERPRETED_TEXT;
INLINE_LITERAL;
REFERENCE;
}
parse
: paragraph+ EOF -> ^(ROOT paragraph+)
;
paragraph
: line+ -> ^(PARAGRAPH line+)
| Space* LineBreak -> /* omit line-breaks between paragraphs from AST */
;
line
: indentation text+ LineBreak -> ^(LINE text+)
;
indentation
: Space* -> ^(INDENTATION Space*)
;
text
: styledText
| interpretedText
| inlineLiteral
| reference
| Space
| Star
| EscapeSequence
| Any
;
styledText
: bold
| italic
;
bold
: Star Star boldAtom+ Star Star -> ^(BOLD boldAtom+)
;
italic
: Star italicAtom+ Star -> ^(ITALIC italicAtom+)
;
boldAtom
: ~(Star | LineBreak)
| italic
;
italicAtom
: ~(Star | LineBreak)
| bold
;
interpretedText
: BackTick interpretedTextAtoms BackTick -> ^(INTERPRETED_TEXT interpretedTextAtoms)
;
interpretedTextAtoms
: ~BackTick+
;
inlineLiteral
: BackTick BackTick inlineLiteralAtoms BackTick BackTick -> ^(INLINE_LITERAL inlineLiteralAtoms)
;
inlineLiteralAtoms
: inlineLiteralAtom+
;
inlineLiteralAtom
: ~BackTick
| BackTick ~BackTick
;
reference
: Any+ UnderScore -> ^(REFERENCE Any+)
;
UnderScore
: '_'
;
BackTick
: '`'
;
Star
: '*'
;
Space
: ' '
| '\t'
;
EscapeSequence
: '\\' ('\\' | '*')
;
LineBreak
: '\r'? '\n'
| '\r'
;
Any
: .
;
When you generate a parser and lexer from the above, and let it parse the following input file:
***x*** **yyy** *zz* * a b c P2 ``*a*`b`` `q` Python_
(note the trailing line break!)
the parser will produce the following AST:

The graph can be created by running this class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source =
"***x*** **yyy** *zz* *\n" +
"a b c\n" +
"\n" +
"P2 ``*a*`b`` `q`\n" +
"Python_\n";
RSTLexer lexer = new RSTLexer(new ANTLRStringStream(source));
RSTParser parser = new RSTParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
or if your source comes from a file, do:
RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst"));
or
RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst", "???"));
where "???" is the encoding of your file.
The class above will print the AST as a DOT file to the console. You can use a DOT viewer to display the AST. In this case, I posted an image created by kgraphviewer. But there are many more viewers around. A nice online one is this one, which appears to be using kgraphviewer under "the hood". Good luck!