Working with namespace while parsing XML using ElementTree

ぐ巨炮叔叔 提交于 2020-02-05 06:35:06

问题


This is follow on question for Modify a XML using ElementTree

I am now having namespaces in my XML and tried understanding the answer at Parsing XML with namespace in Python via 'ElementTree' and have the following.

XML file.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
 <grandParent>
  <parent>
   <child>Sam/Astronaut</child>
  </parent>
 </grandParent>
</project>

My python code after looking at Parsing XML with namespace in Python via 'ElementTree'

import xml.etree.ElementTree as ET

spaces='xmlns':'http://maven.apache.org/POM/4.0.0','schemaLocation':'http://maven.apache.org/xsd/maven-4.0.0.xsd'}

tree = ET.parse("test.xml")
a=tree.find('parent')          
for b in a.findall('child', namespaces=spaces):
 if b.text.strip()=='Jay/Doctor':
    print "child exists"
    break
else:
    ET.SubElement(a,'child').text="Jay/Doctor"

tree.write("test.xml")

I get the error: AttributeError: 'NoneType' object has no attribute 'findall'


回答1:


There are two problems on this line:

a=tree.find('parent')          

First, <parent> is not an immediate child of the root element. <parent> is a grandchild of the root element. The path to parent looks like /project/grandparent/parent. To search for <parent>, try the XPath expression */parent or possiblly //parent.

Second, <parent> exists in the default namespace, so you won't be able to .find() it with just its simple name. You'll need to add the namespace.

Here are two equally valid calls to tree.find(), each of which should find the <parent> node:

a=tree.find('*/{http://maven.apache.org/POM/4.0.0}parent')
a=tree.find('*/xmlns:parent', namespaces=spaces)

Next, the call to findall() needs a namespace qualifier:

for b in a.findall('xmlns:child', namespaces=spaces) 

Fourth, the call to create the new child element needs a namespace qualifier. There may be a way to use the shortcut name, but I couldn't find it. I had to use the long form of the name.

ET.SubElement(a,'{http://maven.apache.org/POM/4.0.0}child').text="Jay/Doctor"

Finally, your XML output will look ugly unless you provide a default namespace:

tree.write('test.xml', default_namespace=spaces['xmlns'])

Unrelated to the XML aspects, you copied my answer from the previous question incorrectly. The else lines up with the for, not with the if:

for ...
  if ...
else ...


来源:https://stackoverflow.com/questions/25070180/working-with-namespace-while-parsing-xml-using-elementtree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!