问题
Hi I have started learning python and want to use it to do something to a XML file with.
I have been looking for information on the best course to follow but frankly I got a little lost. There are so many ways of manipulating XML files like ElementTree, lxml,minidom etc, etc, . Could someone point me into the right direction to go. Or point me to some code I can wrap my head around. I have started experimenting with lxml but haven't gotten any further then printing all elements yet.
Here is what I am trying to do :
- Read a line from the csv file. Load in Label and FullPath.
- Look in XML file for ITEM with mathing FullPath
- Change the FLAG1 for that ITEM to TRUE
- Change the FLAG2 and FLAG3 for that ITEM to FALSE
- Change the Label for that ITEM to the Label from the CSV file.
- Write out new.xml
Below is my xml structure. The two records below repeat like 10000 times in the file.
<ThisIsMyData>
<ITEM>
<Number>0</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>RED</Label> <<-2- After finding 1 I need to change THIS(only this)
<Path>C:\\test\\</Path> <-1- I need to find this
<file>test.png</file>
</ITEM>
<ITEM>
<Number>1</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>Blue</Label>
<Path>c:\\test\\test2\\</Path>
<file>blue.png</file>
</ITEM>
</ThisIsMyData>
So I have a ROOT : then lot of Elements: . Each of them have 7 SubElements.
This is what my CSV file looks like and what I need my output to look like : CSV File :
Label;FullPath
YELLOW;C:\\test\\test.png
YELLOW;c:\\test\\test2\\blue.png
<ThisIsMyData>
<ITEM>
<Number>0</Number>
<Flag1>FALSE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>TRUE</Flag3>
<Label>YELLOW</Label>
<Path>C:\\test\\</Path>
<file>test.png</file>
</ITEM>
<ITEM>
<Number>1</Number>
<Flag1>FALSE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>TRUE</Flag3>
<Label>YELLOW</Label>
<Path>c:\\test\\test2\\</Path>
<file>blue.png</file>
</ITEM>
</ThisIsMyData>
Pastebin link in case layout gets messed up :
http://pastebin.com/embed_js.php?i=QEx2ZGuY
I am trying ElementTree right now using this example : http://pymotw.com/2/xml/etree/ElementTree/parse.html. I have managed to search in the xml for a certain element name and print the contents. But I still do not see a way of finding a matching element on the same level.
from xml.etree import ElementTree
with open('mydata.xml', 'rt') as f:
tree = ElementTree.parse(f)
# filelist = ElementTree.ElementTree.find()
for node in tree.findall('.//file'):
FileName = node.tag, node.text
print FileName
Output :
('file', 'test.png')
('file', 'blue.png')
回答1:
Here's a quick example of how to do what I think you want using lxml.etree
and xpath
.
from cStringIO import StringIO
from lxml import etree
xmlfile = StringIO("""
<ThisIsMyData>
<ITEM>
<Number>0</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>RED</Label>
<Path>C:\\test\\</Path>
<file>test.png</file>
</ITEM>
<ITEM>
<Number>1</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>Blue</Label>
<Path>c:\\test\\test2\\</Path>
<file>blue.png</file>
</ITEM>
</ThisIsMyData>
""".strip())
datafile = StringIO("""
Label;FullPath
YELLOW;C:\\test\\test.png
YELLOW;c:\\test\\test2\\blue.png
""".strip())
# Read "csv". Simple, no error checking, skip first line.
filenameToLabel = {}
for l,f in (x.strip().split(';') for x in datafile.readlines()[1:]):
filenameToLabel[f] = l
def first(seq,default=None):
"""xpath helper function"""
for item in seq:
return item
return None
doc = etree.XML(xmlfile.read())
for item in doc.xpath('//ITEM'):
item_filename = first(item.xpath('./Path/text()'),'').strip() + first(item.xpath('./file/text()'),'').strip()
label = filenameToLabel.get(item_filename)
if label is not None:
first(item.xpath('./Flag1')).text = 'TRUE'
first(item.xpath('./Flag2')).text = 'FALSE'
first(item.xpath('./Flag3')).text = 'FALSE'
first(item.xpath('./Label')).text = label
print etree.tostring(doc)
Yields
<ThisIsMyData>
<ITEM>
<Number>0</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>YELLOW</Label>
<Path>C:\test\</Path>
<file>test.png</file>
</ITEM>
<ITEM>
<Number>1</Number>
<Flag1>TRUE</Flag1>
<Flag2>FALSE</Flag2>
<Flag3>FALSE</Flag3>
<Label>YELLOW</Label>
<Path>c:\test\test2\</Path>
<file>blue.png</file>
</ITEM>
</ThisIsMyData>
回答2:
First of all use python csv module to get your data from csv file. String split will just work fine if data is not big.
Than create your xml using etree.XML
.
example:
>>>from lxml import etree
>>> csv_value = 'C:\\test\\'
>>> st = '<document>'+'<Flag1>FALSE</Flag1>' + '<Flag2>FALSE</Flag2>'+'<Path>' + csv_value + '</Path>' + '</document>'
>>> tree = etree.XML(st)
>>> etree.tostring(tree)
'<document><Flag1>FALSE</Flag1><Flag2>FALSE</Flag2><Path>C:\\test\\</Path></document>'
Fetching csv_value
is left to you as an exercise.
Also take a look at this question.
回答3:
I find that Beautiful Soup, and its sister, Beautiful Stone Soup, have really good, terse, example-based documentation that lends itself to diving in and trying things out on real world examples.
But, I've also heard that ElementTree is considered by some to be the gold standard in python.
来源:https://stackoverflow.com/questions/9062394/lost-in-xml-and-python