python: error with basic XML parsing (with lxml)

ぐ巨炮叔叔 提交于 2019-12-11 05:51:57

问题


I am trying to parse an XML file with python using lxml, but get an error on basic attempts. I use this post and the lxml tutorials to bootstrap.

My XML file is basically built from records below (I trimmed it down so that it is easier to read):

<?xml version="1.0" ?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
  <hostnames>
    <hostname name="host1.example.com" type="PTR"/>
  </hostnames>
</host>
</nmaprun>

I run it through this complicated script:

from lxml import etree

d = etree.parse("myfile.xml")
for host in d.findall("host"):
    aa = host.find("hostnames/hostname")
    print aa.attrib["name"]

I get AttributeError: 'NoneType' object has no attribute 'attrib' on the print line. I checked the value of d, host and aa and they are all defined as Elements.

Upfront apologies if this is something obvious (and it probably is).

EDIT: I added the header of the XML file as requested (I am still reading and rereading the answers :))

Thanks!


回答1:


Though it would make more sense to use XPath, your code already works fine when standing alone, so long as one handles the case where a host has no hostnames found:

doc = lxml.etree.XML("""
  <nmaprun>
    <host>
      <hostnames>
        <hostname name="host1.example.com" type="PTR"/>
      </hostnames>
    </host>
  </nmaprun>""")
for host in doc.findall('host'):
  host_el = host.find('hostnames/hostname')
  if host_el is not None:
    print host_el.attrib['name']

With XPath (doc.xpath() rather than doc.find() or doc.findall()), one could do better, filtering only for hostnames with a name and thus avoiding the faulty records altogether:

  • host[hostnames/hostname/@name] will find hosts which have at least one hostnames with a hostname with a a name attribute.
  • //hostnames/hostname/@name will directly return only the names themselves (if using lxml, exposing these as strings).



回答2:


You can solve this with an xpath expression.

d.xpath('//hostname/@name') # thank you for comment

Alternatively

for host in d.xpath('//hostname'):
    print host.get('name'), host.get('whatever else etc...')



回答3:


It looks like you might have some <host> element that either have not <hostnames> or no <hostname> sub-element defined.

As suggested in a comment to your question by @Charles Duffy, you need to check that your call to find() found an element

for host in d.findall("host"):
    aa = host.find("hostnames/hostname")
    if aa:
        print aa.attrib["name"]


来源:https://stackoverflow.com/questions/11123536/python-error-with-basic-xml-parsing-with-lxml

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!