How to parse restructuredtext in python?

我与影子孤独终老i 提交于 2019-11-28 09:47:42
Gareth Latty

Docutils does indeed contain the tools to do this.

What you probably want is the parser at docutils.parsers.rst

See this page for details on what is involved. There are also some examples at docutils/examples.py - particularly check out the internals() function, which is probably of interest.

I'd like to extend upon the answer from Gareth Latty. "What you probably want is the parser at docutils.parsers.rst" is a good starting point of the answer, but what's next? Namely:

How to parse restructuredtext in python?

Below is the exact answer for Python 3.6 and docutils 0.14:

import docutils.nodes
import docutils.parsers.rst
import docutils.utils

def parse_rst(text: str) -> docutils.nodes.document:
    parser = docutils.parsers.rst.Parser()
    components = (docutils.parsers.rst.Parser,)
    settings = docutils.frontend.OptionParser(components=components).get_default_values()
    document = docutils.utils.new_document('<rst-doc>', settings=settings)
    parser.parse(text, document)
    return document

And the resulting document can be processed using, for example, below, which will print all references in the document:

class MyVisitor(docutils.nodes.NodeVisitor):

    def visit_reference(self, node: docutils.nodes.reference) -> None:
        """Called for "reference" nodes."""
        print(node)

    def unknown_visit(self, node: docutils.nodes.Node) -> None:
        """Called for all other node types."""
        pass

Here's how to run it:

doc = parse_rst('spam spam lovely spam')
visitor = MyVisitor(doc)
doc.walk(visitor)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!