Get attribute of complex element using lxml

问题

I have a simple file XML like below:

    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" />BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
      </maxspeed>

I want to parse it using lxml and get the value of it: With brandName, it just need:

    'brand_name'  : m.findtext(NS+'brandName')

If I want to get into abbrev attribute of it.

    'brand_name'  : m.findtext(NS+'brandName').attrib['abbrev']

With maxspeed, i can get the value of maxspeed by:

    'maxspeed_value'                  : m.findtext(NS+'maxspeed/value'),

or:

    'maxspeed_value'                  : m.find(NS+'maxspeed/value').text,

Now, I want to get the attribute of unit inside , I have tried a lot of different way but I'm failed. The error most of time is:

    'NoneType' object has no attribute 'attrib'

Here are several ways I tried and it failed:

    'maxspeed_unit'                  : m.find(NS+'maxspeed/value').attrib['abbrev'],
    'maxspeed_unit'                  : (m.find(NS+'maxspeed/value'))get('abbrev'),

Could you please give me some hint why it doesn't work? Thank you very much!

UPDATE XML:

    <Car xmlns="http://example.com/vocab/xml/cars#">
     <dateStarted>2011-02-05</dateStarted>
     <dateSold>2011-02-13</dateSold>
    <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" />BMW</brandName>
      <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
      </maxspeed>
      <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
      <power>
        <value>180</value>
        <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
      </power>
      <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>

回答1:

The .find method on an lxml Element will only search the direct sub-children of that element. so for example in this xml:

<root>
    <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW">BMW</brandName>
    <maxspeed>
        <value>250</value>
        <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
    </maxspeed>
</root>

You can use the root Elements .find method to locate the brandname element, or the maxspeed element, but the search will not traverse inside these inner elements.

So you could for example do something like this:

root.find('maxspeed').find('unit') #returns the unit Element

From this returned element you can access the attributes.

If you'd like to search through all the elements within an XML doc, you can use the .iter() method. So for the previous example you could say:

for element in root.iter(tag='unit'):
    print element #This would print all the unit elements in the document.

EDIT: Here is a small fully functional example using the xml you provided:

import lxml.etree
from StringIO import StringIO

def ns_join(element, tag, namespace=None):
    '''Joins the namespace and tag together, and
    returns the fully qualified name.
    @param element - The lxml.etree._Element you're searching
    @param tag - The tag you're joining
    @param namespace - (optional) The Namespace shortname default is None'''

    return '{%s}%s' % (element.nsmap[namespace], tag)

def parse_car(element):
    '''Parse a car element, This will return a dictionary containing
    brand_name, maxspeed_value, and maxspeed_unit'''

    maxspeed = element.find(ns_join(element,'maxspeed'))
    return { 
        'brand_name' : element.findtext(ns_join(element,'brandName')), 
        'maxspeed_value' : maxspeed.findtext(ns_join(maxspeed,'value')), 
        'maxspeed_unit' : maxspeed.find(ns_join(maxspeed, 'unit')).attrib['abbrev']
        }

#Create the StringIO object to feed to the parser.
XML = StringIO('''
<Reports>
    <Car xmlns="http://example.com/vocab/xml/cars#">
        <dateStarted>2011-02-05</dateStarted>
        <dateSold>2011-02-13</dateSold>
        <name type="http://example.com/codes/bmw#" abbrev="X6" value="BMW X6" >BMW X6</name>
        <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
        <maxspeed>
            <value>250</value>
            <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
        </maxspeed>
        <route type="http://example.com/codes/routes#" abbrev="HW" value="Highway" >Highway</route>
        <power>
            <value>180</value>
            <unit type="http://example.com/codes/units#" value="powerhorse" abbrev="ph" />
        </power>
        <frequency type="http://example.com/codes/frequency#" value="daily" >Daily</frequency>  
    </Car>
</Reports>
''')

#Get the root element object of the xml
car_root_element = lxml.etree.parse(XML).getroot()

# For each 'Car' tag in the root element,
# we want to parse it and save the list as cars
cars = [ parse_car(element) 
    for element in car_root_element.iter() if element.tag.endswith('Car')]

print cars

Hope it helps.

回答2:

import lxml.etree as ET
content='''
<Car xmlns="http://example.com/vocab/xml/cars#">
 <brandName type="http://example.com/codes/bmw#" abbrev="BMW" value="BMW" >BMW</brandName>
   <maxspeed>
     <value>250</value>
     <unit type="http://example.com/codes/units#" value="miles per hour" abbrev="mph" />
   </maxspeed>
 </Car>
'''

doc=ET.fromstring(content)
NS = 'http://example.com/vocab/xml/cars#'
# print(ET.tostring(doc,pretty_print=True))
for x in doc.xpath('//ns:maxspeed/ns:unit/@abbrev',namespaces={'ns': NS}):
    print(x)

yields

mph

来源：https://stackoverflow.com/questions/8161530/get-attribute-of-complex-element-using-lxml

标签

python

lxml

elementtree