Registering namespaces with lxml before parsing

 ̄綄美尐妖づ 提交于 2021-02-11 05:05:25

问题


I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns. I am trying to register it by hand with register_namespace, but that doesn't seem to work.

from lxml import etree

xml = """
    <Foo xsi:type="xsd:string">bar</Foo>
"""

etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined

What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi namespace should already be there as one of the default namespaces.


回答1:


When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace has no effect).

If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace before parsing cannot fix this.


register_namespace defines the prefixes to be used when serializing a newly created XML document.

Example 1 (without register_namespace):

from lxml import etree

el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())

Output:

<ns0:Foo xmlns:ns0="http://example.com"/>

Example 2 (with register_namespace):

from lxml import etree

etree.register_namespace("abc", "http://example.com")

el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())

Output:

<abc:Foo xmlns:abc="http://example.com"/>

Example 3 (without register_namespace, but with a "well-known" namespace associated with a conventional prefix):

from lxml import etree

el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
print(etree.tostring(el).decode())

Output:

<xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>



回答2:


Namespace-well-formed XML that uses custom namespaces must also include the namespace declaration itself. Adding an xmlns in the first element is enough:

from lxml import etree

xml = """
    <Foo xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='xsd:string'>bar</Foo>
"""
el = etree.fromstring(xml)    
print (el)

So, technically, if your XML uses xsi but it does not contain the namespace declaration, it's not (namespace) well-formed XML.

See also How to restrict the value of an XML element using xsi:type in XSD?



来源:https://stackoverflow.com/questions/59850806/registering-namespaces-with-lxml-before-parsing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!