问题
I am using lxml to parse XML from an external service that has namespaces, but doesn't register them with xmlns
. I am trying to register it by hand with register_namespace
, but that doesn't seem to work.
from lxml import etree
xml = """
<Foo xsi:type="xsd:string">bar</Foo>
"""
etree.register_namespace('xsi', 'http://www.w3.org/2001/XMLSchema-instance')
el = etree.fromstring(xml) # lxml.etree.XMLSyntaxError: Namespace prefix xsi for type on Foo is not defined
What am I missing? Oddly enough, looking at the lxml source code to try and understand what I might be doing wrong, it seems as if the xsi
namespace should already be there as one of the default namespaces.
回答1:
When an XML document is parsed and then saved again, lxml does not change any prefixes (and register_namespace
has no effect).
If your XML document does not declare its namespace prefixes, it is not namespace-well-formed. Using register_namespace
before parsing cannot fix this.
register_namespace
defines the prefixes to be used when serializing a newly created XML document.
Example 1 (without register_namespace
):
from lxml import etree
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
Output:
<ns0:Foo xmlns:ns0="http://example.com"/>
Example 2 (with register_namespace
):
from lxml import etree
etree.register_namespace("abc", "http://example.com")
el = etree.Element('{http://example.com}Foo')
print(etree.tostring(el).decode())
Output:
<abc:Foo xmlns:abc="http://example.com"/>
Example 3 (without register_namespace
, but with a "well-known" namespace associated with a conventional prefix):
from lxml import etree
el = etree.Element('{http://www.w3.org/2001/XMLSchema-instance}Foo')
print(etree.tostring(el).decode())
Output:
<xsi:Foo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
回答2:
Namespace-well-formed XML that uses custom namespaces must also include the namespace declaration itself. Adding an xmlns
in the first element is enough:
from lxml import etree
xml = """
<Foo xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:type='xsd:string'>bar</Foo>
"""
el = etree.fromstring(xml)
print (el)
So, technically, if your XML uses xsi
but it does not contain the namespace declaration, it's not (namespace) well-formed XML.
See also How to restrict the value of an XML element using xsi:type in XSD?
来源:https://stackoverflow.com/questions/59850806/registering-namespaces-with-lxml-before-parsing