How do I validate XML document using compact RELAX NG schema in Python?

扶醉桌前 提交于 2019-12-01 04:16:58

How about using lxml?

From the docs:

>>> f = StringIO('''\
... <element name="a" xmlns="http://relaxng.org/ns/structure/1.0">
...  <zeroOrMore>
...     <element name="b">
...       <text />
...     </element>
...  </zeroOrMore>
... </element>
... ''')
>>> relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False

If you want to check syntax vs Compact RelaxNG Syntax from command line, you can use pyjing, from the jingtrang module.

It supports .rnc files and displays more details than just True or False. For example:

C:\>pyjing -c root.rnc invalid.xml
C:\invalid.xml:9:9: error: element "name" not allowed here; expected the element end-tag or element "bounds"

NOTE: it is a Python wrapper of the Java jingtrang so it requires to have Java installed.

If you want to check the syntax from within Python, you can

  1. Use pytrang (from jingtrang wrapper) to convert "Compact RelaxNG" (.rnc) to XML RelaxNG (.rng): pytrang root.rnc root.rng

  2. Use lxml to parse converted .rng file like this: https://lxml.de/validation.html#relaxng

That would be something like that:

>>> from lxml import etree
>>> from subprocess import call

>>> call("pytrang root.rnc root.rng")

>>> with open("root.rng") as f:
...    relaxng_doc = etree.parse(f)
>>> relaxng = etree.RelaxNG(relaxng_doc)

>>> valid = StringIO('<a><b></b></a>')
>>> doc = etree.parse(valid)
>>> relaxng.validate(doc)
True

>>> invalid = StringIO('<a><c></c></a>')
>>> doc2 = etree.parse(invalid)
>>> relaxng.validate(doc2)
False
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!