问题
Using Python 2.7 and lxml
, how do I modify XML elements with multiple values?
E.g.
<Title>
<Playcount>1</Playcount>
<Genre>Adventure</Genre>
<Genre>Comedy</Genre>
<Genre>Action</Genre>
</Title>
It is straight forward to modify Playcount
, as it has a single value. How do I modify Genre
, witch has multiple values?
e.g:
How do I delete all but the first genre?
How do I add a genre?
- How do I modify all Baseball genre to Sports?
Thanks.
回答1:
Like this::
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.fromstring("""<Title>
<Playcount>1</Playcount>
<Genre>Adventure</Genre>
<Genre>Comedy</Genre>
<Genre>Action</Genre>
<someTag>Text</someTag>
</Title>""", parser=parser)
New playcount:
playcount = tree.find('Playcount')
playcount.text = "2"
Delete genres (not first):
title = tree.xpath('/Title')[0]
genres = title.xpath('Genre')
for element in genres:
if element.tag == "Genre" and element != title.xpath("Genre[1]")[0]:
element.getparent().remove(element)
New genre:
genre = etree.Element("Genre")
genre.text = "New Genre"
tree.xpath('/Title/Genre[last()]')[0].addnext(genre)
Result:
print etree.tostring(tree, pretty_print=True)
回答2:
Consider an XSLT solution when tasked to manipulate original XML files. As just mentioned on this PHP question, XSLT (whose script is a well-formed XML file) is a special purpose, declarative programming language and can handle multiple tasks in one script as illustrated below.
Most general-purpose languages including Python (lxml module), PHP (xsl extension), Java (javax.xml), Perl (libxml), C# (System.Xml), and VB (MSXML) maintain XSLT 1.0 processors. And various external executable processors like Xalan and Saxon (the latter of which can run XSLT 2.0 and recently 3.0) are also available -which of course Python can call with subprocess.call()
.
Below includes the XSLT and Python scripts respectively as the former is loaded in the latter. And as mentioned above, the xslt is portable to other languages/platforms.
XSLT script (save as .xsl or .xslt)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM (COPY CONTENT AS IS) -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CHANGE PLAYCOUNT -->
<xsl:template match="Playcount">
<xsl:copy>newvalue</xsl:copy>
</xsl:template>
<!-- EMPTY TEMPLATE TO REMOVE NODES BY POSITION -->
<xsl:template match="Genre[position() > 1]"></xsl:template>
<!-- ADD NEW GENRE -->
<xsl:template match="Title">
<xsl:copy>
<xsl:apply-templates/>
<Genre>new</Genre>
</xsl:copy>
</xsl:template>
<!-- CHANGE BASEBALL GENRE TO SPORTS -->
<xsl:template match="Title[Genre='Baseball']">
<xsl:copy>Sports</xsl:copy>
</xsl:template>
</xsl:transform>
Python Script
import lxml.etree as ET
# LOAD XML AND XSLT FILES
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM INTO DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)
# OUTPUT TO PRETTY PRINT STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out.decode("utf-8"))
# SAVE AS FILE
xmlfile = open('Output.xml')
xmlfile.write(tree_out)
xmlfile.close()
Result (notice all above questions being handled below, except Baseball which was not present in posted data)
<?xml version='1.0' encoding='UTF-8'?>
<Title>
<Playcount>newvalue</Playcount>
<Genre>Adventure</Genre>
<Genre>new</Genre>
</Title>
来源:https://stackoverflow.com/questions/35977447/python-elementtree-xml-modifying-elements-with-multiple-values