python lxml using iterparse to edit and output xml

爱⌒轻易说出口 提交于 2019-12-22 11:09:22

问题


I've been messing around with the lxml library for a little while and maybe I'm not understanding it correctly or I'm missing something but I can't seem to figure out how to edit the file after I catch a certain xpath and then be able to write that back out into xml while I'm parsing element by element.

Say we have this xml as an example:

<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>

What I would like to do while parsing is when I hit that xpath of "/xml/items/pie" is to add an element before pie, so it will turn out like this:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
  </items>
</xml>

That output would need to be done by writing to a file line by line as I hit each tag and edit the xml at certain xpaths. I mean I could have it print the starting tag, the text, the attribute if it exists, and then the ending tag by hard coding certain parts, but that would be very messy and it be nice if there was a way to avoid that if possible.

Here's my guess code at this:

from lxml import etree

path=[]
count=0

context=etree.iterparse(file,events=('start','end'))
for event, element in context:
    if event=='start':
       path.append(element.tag)
       if /'+'/'.join(path)=='/xml/items/pie':
          itemnode=etree.Element('item',id=str(count))
          itemnode.text=""
          element.addprevious(itemnode)#Not the right way to do it of course
          #write/print out xml here.
    else:
        element.clear()
        path.pop()

Edit: Also, I need to run through fairly big files, so I have to use iterparse.


回答1:


Here's a solution using iterparse(). The idea is to catch all tag "start" events, remember the parent (items) tag, then for every pie tag create an item tag and put the pie into it:

from StringIO import StringIO
from lxml import etree
from lxml.etree import Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""

stream = StringIO(data)
context = etree.iterparse(stream, events=("start", ))

for action, elem in context:
    if elem.tag == 'items':
        items = elem
        index = 1
    elif elem.tag == 'pie':
        item = Element('item', {'id': str(index)})
        items.replace(elem, item)
        item.append(elem)
        index += 1

print etree.tostring(context.root)

prints:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>



回答2:


There is a more clean way to make modifications you need:

  • iterate over pie elements
  • make an item element
  • use replace() to replace a pie element with item

replace(self, old_element, new_element)

Replaces a subelement with the element passed as second argument.


from lxml import etree
from lxml.etree import XMLParser, Element

data = """<xml>
   <items>
      <pie>cherry</pie>
      <pie>apple</pie>
      <pie>chocolate</pie>
  </items>
</xml>"""


tree = etree.fromstring(data, parser=XMLParser())
items = tree.find('.//items')
for index, pie in enumerate(items.xpath('.//pie'), start=1):
    item = Element('item', {'id': str(index)})
    items.replace(pie, item)
    item.append(pie)

print etree.tostring(tree, pretty_print=True)

prints:

<xml>
   <items>
      <item id="1"><pie>cherry</pie></item>
      <item id="2"><pie>apple</pie></item>
      <item id="3"><pie>chocolate</pie></item>
   </items>
</xml>



回答3:


I would suggest you to use an XSLT template, as it seems to match better for this task. Initially XSLT is a little bit tricky until you get used to it, if all you want is to generate some output from an XML, then XSLT is a great tool.



来源:https://stackoverflow.com/questions/22493724/python-lxml-using-iterparse-to-edit-and-output-xml

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!