I have an xml doc that I am trying to parse using Etree.lxml
1&
import io
import lxml.etree as ET
content='''\
<Envelope xmlns="http://www.example.com/zzz/yyy">
<Header>
<Version>1</Version>
</Header>
<Body>
some stuff
</Body>
</Envelope>
'''
dom = ET.parse(io.BytesIO(content))
You can find namespace-aware nodes using the xpath
method:
body=dom.xpath('//ns:Body',namespaces={'ns':'http://www.example.com/zzz/yyy'})
print(body)
# [<Element {http://www.example.com/zzz/yyy}Body at 90b2d4c>]
If you really want to remove namespaces, you could use an XSL transformation:
# http://wiki.tei-c.org/index.php/Remove-Namespaces.xsl
xslt='''<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
'''
xslt_doc=ET.parse(io.BytesIO(xslt))
transform=ET.XSLT(xslt_doc)
dom=transform(dom)
Here we see the namespace has been removed:
print(ET.tostring(dom))
# <Envelope>
# <Header>
# <Version>1</Version>
# </Header>
# <Body>
# some stuff
# </Body>
# </Envelope>
So you can now find the Body node this way:
print(dom.find("Body"))
# <Element Body at 8506cd4>
Try using Xpath:
dom.xpath("//*[local-name() = 'Body']")
Taken (and simplified) from this page, under "The xpath() method" section
You're showing the result of the repr() call. When you programmatically move through the tree, you can simply choose to ignore the namespace.
The last solution from https://bitbucket.org/olauzanne/pyquery/issue/17 can help you to avoid namespaces with little effort
apply
xml.replace(' xmlns:', ' xmlnamespace:')
to your xml before using pyquery so lxml will ignore namespaces
In your case, try xml.replace(' xmlns="', ' xmlnamespace="')
. However, you might need something more complex if the string is expected in the bodies as well.