Parsing blank XML tags with LXML and Python

后端 未结 4 1843
自闭症患者
自闭症患者 2021-01-26 12:58

When parsing XML documents in the format of:


    Blue
    Chevy
    Camaro         


        
4条回答
  •  野性不改
    2021-01-26 13:55

    You're putting in a [text()] filter which explicitly asks only for elements which have text nodes them... and then you're unhappy when it doesn't give you elements without text nodes?

    Leave that filter out, and you'll get your model element:

    >>> s='''
    ... 
    ...   
    ...     Blue
    ...     Chevy
    ...     
    ...   
    ... '''
    >>> e = lxml.etree.fromstring(s)
    >>> carData = e.xpath('Car/node()')
    >>> carData
    [, , ]
    >>> dict(((e.tag, e.text) for e in carData))
    {'Color': 'Blue', 'Make': 'Chevy', 'Model': None}
    

    That said -- if your immediate goal is to iterate over the nodes in the tree, you might consider using lxml.etree.iterparse() instead, which will avoid trying to build a full DOM tree in memory and otherwise be much more efficient than building a tree and then iterating over it with XPath. (Think SAX, but without the insane and painful API).

    Implementing with iterparse could look like this:

    def get_cars(infile):
        in_car = False
        current_car = {}
        for (event, element) in lxml.etree.iterparse(infile, events=('start', 'end')):
            if event == 'start':
                if element.tag == 'Car':
                    in_car = True
                    current_car = {}
                continue
            if not in_car: continue
            if element.tag == 'Car':
                yield current_car
                continue
            current_car[element.tag] = element.text
    
    for car in get_cars(infile = cStringIO.StringIO('''BlueChevy''')):
      print car
    

    ...it's more code, but (if we weren't using StringIO for the example) it could process a file much larger than could fit in memory.

提交回复
热议问题