ANTLR grammar for reStructuredText (rule priorities)

后端 未结 2 1783
旧巷少年郎
旧巷少年郎 2020-12-30 17:10

First question stream

Hello everyone,

This could be a follow-up on this question: Antlr rule priorities

I\'m trying to write an ANTLR grammar for t

2条回答
  •  情深已故
    2020-12-30 18:12

    Here's a quick demo how you could parse this reStructeredText. Note that it just handles a minor set of all available markup-syntax, and by adding more to it, you will affect the existing parser/lexer rules: so there is much, much more work to be done!

    Demo

    grammar RST;
    
    options {
      output=AST;
      backtrack=true;
      memoize=true;
    }
    
    tokens {
      ROOT;
      PARAGRAPH;
      INDENTATION;
      LINE;
      WORD;
      BOLD;
      ITALIC;
      INTERPRETED_TEXT;
      INLINE_LITERAL;
      REFERENCE;
    }
    
    parse
      :  paragraph+ EOF -> ^(ROOT paragraph+)
      ;
    
    paragraph
      :  line+ -> ^(PARAGRAPH line+)
      |  Space* LineBreak -> /* omit line-breaks between paragraphs from AST */
      ;
    
    line
      :  indentation text+ LineBreak -> ^(LINE text+)
      ;
    
    indentation
      :  Space* -> ^(INDENTATION Space*)
      ;
    
    text
      :  styledText
      |  interpretedText
      |  inlineLiteral
      |  reference
      |  Space
      |  Star
      |  EscapeSequence
      |  Any
      ;
    
    styledText
      :  bold
      |  italic
      ;
    
    bold
      :  Star Star boldAtom+ Star Star -> ^(BOLD boldAtom+)
      ;  
    
    italic
      :  Star italicAtom+ Star -> ^(ITALIC italicAtom+)
      ;
    
    boldAtom
      :  ~(Star | LineBreak)
      |  italic
      ;
    
    italicAtom
      :  ~(Star | LineBreak)
      |  bold
      ;
    
    interpretedText
      :  BackTick interpretedTextAtoms BackTick -> ^(INTERPRETED_TEXT interpretedTextAtoms)
      ;
    
    interpretedTextAtoms
      :  ~BackTick+
      ;
    
    inlineLiteral
      :  BackTick BackTick inlineLiteralAtoms BackTick BackTick -> ^(INLINE_LITERAL inlineLiteralAtoms)
      ;
    
    inlineLiteralAtoms
      :  inlineLiteralAtom+
      ;
    
    inlineLiteralAtom
      :  ~BackTick
      |  BackTick ~BackTick
      ;
    
    reference
      :  Any+ UnderScore -> ^(REFERENCE Any+)
      ;
    
    UnderScore
      :  '_'
      ;
    
    BackTick
      :  '`'
      ;
    
    Star
      :  '*'
      ;
    
    Space
      :  ' ' 
      |  '\t'
      ;
    
    EscapeSequence
      :  '\\' ('\\' | '*')
      ;
    
    LineBreak
      :  '\r'? '\n'
      |  '\r'
      ;
    
    Any
      :  .
      ;
    

    When you generate a parser and lexer from the above, and let it parse the following input file:

    ***x*** **yyy** *zz* *
    a b c
    
    P2 ``*a*`b`` `q`
    Python_
    
    

    (note the trailing line break!)

    the parser will produce the following AST:

    enter image description here

    EDIT

    The graph can be created by running this class:

    import org.antlr.runtime.*;
    import org.antlr.runtime.tree.*;
    import org.antlr.stringtemplate.*;
    
    public class Main {
      public static void main(String[] args) throws Exception {
        String source =
            "***x*** **yyy** *zz* *\n" +
            "a b c\n" +
            "\n" +
            "P2 ``*a*`b`` `q`\n" +
            "Python_\n";
        RSTLexer lexer = new RSTLexer(new ANTLRStringStream(source));
        RSTParser parser = new RSTParser(new CommonTokenStream(lexer));
        CommonTree tree = (CommonTree)parser.parse().getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
      }
    }
    

    or if your source comes from a file, do:

    RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst"));
    

    or

    RSTLexer lexer = new RSTLexer(new ANTLRFileStream("test.rst", "???"));
    

    where "???" is the encoding of your file.

    The class above will print the AST as a DOT file to the console. You can use a DOT viewer to display the AST. In this case, I posted an image created by kgraphviewer. But there are many more viewers around. A nice online one is this one, which appears to be using kgraphviewer under "the hood". Good luck!

提交回复
热议问题