Lost in XML and Python | 易学教程

问题

Hi I have started learning python and want to use it to do something to a XML file with.

I have been looking for information on the best course to follow but frankly I got a little lost. There are so many ways of manipulating XML files like ElementTree, lxml,minidom etc, etc, . Could someone point me into the right direction to go. Or point me to some code I can wrap my head around. I have started experimenting with lxml but haven't gotten any further then printing all elements yet.

Here is what I am trying to do :

Read a line from the csv file. Load in Label and FullPath.
Look in XML file for ITEM with mathing FullPath
Change the FLAG1 for that ITEM to TRUE
Change the FLAG2 and FLAG3 for that ITEM to FALSE
Change the Label for that ITEM to the Label from the CSV file.
Write out new.xml

Below is my xml structure. The two records below repeat like 10000 times in the file.

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label> <<-2- After finding 1 I need to change THIS(only this)
    <Path>C:\\test\\</Path> <-1- I need to find this 
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

So I have a ROOT : then lot of Elements: . Each of them have 7 SubElements.

This is what my CSV file looks like and what I need my output to look like : CSV File :

  Label;FullPath
  YELLOW;C:\\test\\test.png
  YELLOW;c:\\test\\test2\\blue.png

 <ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>FALSE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>TRUE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>

Pastebin link in case layout gets messed up :

http://pastebin.com/embed_js.php?i=QEx2ZGuY

I am trying ElementTree right now using this example : http://pymotw.com/2/xml/etree/ElementTree/parse.html. I have managed to search in the xml for a certain element name and print the contents. But I still do not see a way of finding a matching element on the same level.

from xml.etree import ElementTree
with open('mydata.xml', 'rt') as f:
    tree = ElementTree.parse(f)
#    filelist = ElementTree.ElementTree.find()
for node in tree.findall('.//file'):
    FileName = node.tag, node.text
    print FileName

Output :

('file', 'test.png')
('file', 'blue.png')

回答1:

Here's a quick example of how to do what I think you want using lxml.etree and xpath.

from cStringIO import StringIO
from lxml import etree

xmlfile = StringIO("""
<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>  
    <Flag3>FALSE</Flag3>
    <Label>RED</Label>
    <Path>C:\\test\\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>Blue</Label>
    <Path>c:\\test\\test2\\</Path>
    <file>blue.png</file>
  </ITEM>
 </ThisIsMyData>
""".strip())

datafile = StringIO("""
Label;FullPath
YELLOW;C:\\test\\test.png
YELLOW;c:\\test\\test2\\blue.png
""".strip())

# Read "csv". Simple, no error checking, skip first line.
filenameToLabel = {}
for l,f in (x.strip().split(';') for x in datafile.readlines()[1:]):
  filenameToLabel[f] = l

def first(seq,default=None):
  """xpath helper function"""
  for item in seq:
    return item
  return None

doc = etree.XML(xmlfile.read())

for item in doc.xpath('//ITEM'):
  item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip()
  label = filenameToLabel.get(item_filename)
  if label is not None:
    first(item.xpath('./Flag1')).text = 'TRUE'
    first(item.xpath('./Flag2')).text = 'FALSE'
    first(item.xpath('./Flag3')).text = 'FALSE'
    first(item.xpath('./Label')).text = label

print etree.tostring(doc)

Yields

<ThisIsMyData>
  <ITEM>
    <Number>0</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>C:\test\</Path>
    <file>test.png</file>
  </ITEM>
  <ITEM>
    <Number>1</Number>
    <Flag1>TRUE</Flag1>
    <Flag2>FALSE</Flag2>
    <Flag3>FALSE</Flag3>
    <Label>YELLOW</Label>
    <Path>c:\test\test2\</Path>
    <file>blue.png</file>
  </ITEM>
</ThisIsMyData>

回答2:

First of all use python csv module to get your data from csv file. String split will just work fine if data is not big.

Than create your xml using etree.XML.

example:

>>>from lxml import etree
>>> csv_value = 'C:\\test\\'
>>> st = '<document>'+'<Flag1>FALSE</Flag1>' + '<Flag2>FALSE</Flag2>'+'<Path>' + csv_value + '</Path>' + '</document>'
>>> tree = etree.XML(st)
>>> etree.tostring(tree)
'<document><Flag1>FALSE</Flag1><Flag2>FALSE</Flag2><Path>C:\\test\\</Path></document>'

Fetching csv_value is left to you as an exercise.

Also take a look at this question.

回答3:

I find that Beautiful Soup, and its sister, Beautiful Stone Soup, have really good, terse, example-based documentation that lends itself to diving in and trying things out on real world examples.

But, I've also heard that ElementTree is considered by some to be the gold standard in python.

来源：https://stackoverflow.com/questions/9062394/lost-in-xml-and-python

标签

python

xml

csv

lxml