Python ElementTree XML Modifying Elements with Multiple Values

半世苍凉 提交于 2019-12-12 02:56:20

问题


Using Python 2.7 and lxml, how do I modify XML elements with multiple values?

E.g.

    <Title>
      <Playcount>1</Playcount>
      <Genre>Adventure</Genre>
      <Genre>Comedy</Genre>
      <Genre>Action</Genre>
    </Title>

It is straight forward to modify Playcount, as it has a single value. How do I modify Genre, witch has multiple values?

e.g:

  1. How do I delete all but the first genre?

  2. How do I add a genre?

  3. How do I modify all Baseball genre to Sports?

Thanks.


回答1:


Like this::

from lxml import etree

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.fromstring("""<Title>
    <Playcount>1</Playcount>
     <Genre>Adventure</Genre>
     <Genre>Comedy</Genre>
     <Genre>Action</Genre>
     <someTag>Text</someTag>
    </Title>""", parser=parser)

New playcount:

playcount = tree.find('Playcount')
playcount.text = "2"

Delete genres (not first):

title = tree.xpath('/Title')[0]
genres = title.xpath('Genre')

for element in genres:
    if element.tag == "Genre" and element != title.xpath("Genre[1]")[0]:
        element.getparent().remove(element)

New genre:

genre = etree.Element("Genre")
genre.text = "New Genre"
tree.xpath('/Title/Genre[last()]')[0].addnext(genre)

Result:

print etree.tostring(tree, pretty_print=True)



回答2:


Consider an XSLT solution when tasked to manipulate original XML files. As just mentioned on this PHP question, XSLT (whose script is a well-formed XML file) is a special purpose, declarative programming language and can handle multiple tasks in one script as illustrated below.

Most general-purpose languages including Python (lxml module), PHP (xsl extension), Java (javax.xml), Perl (libxml), C# (System.Xml), and VB (MSXML) maintain XSLT 1.0 processors. And various external executable processors like Xalan and Saxon (the latter of which can run XSLT 2.0 and recently 3.0) are also available -which of course Python can call with subprocess.call().

Below includes the XSLT and Python scripts respectively as the former is loaded in the latter. And as mentioned above, the xslt is portable to other languages/platforms.

XSLT script (save as .xsl or .xslt)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- IDENTITY TRANSFORM (COPY CONTENT AS IS) -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>  

  <!-- CHANGE PLAYCOUNT -->
  <xsl:template match="Playcount">
    <xsl:copy>newvalue</xsl:copy>
  </xsl:template>

  <!-- EMPTY TEMPLATE TO REMOVE NODES BY POSITION -->
  <xsl:template match="Genre[position() &gt; 1]"></xsl:template>

  <!-- ADD NEW GENRE -->
  <xsl:template match="Title">
    <xsl:copy>
      <xsl:apply-templates/>
      <Genre>new</Genre>
    </xsl:copy>
  </xsl:template>

  <!-- CHANGE BASEBALL GENRE TO SPORTS -->
  <xsl:template match="Title[Genre='Baseball']">
    <xsl:copy>Sports</xsl:copy>
  </xsl:template>

</xsl:transform>

Python Script

import lxml.etree as ET

# LOAD XML AND XSLT FILES
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')

# TRANSFORM INTO DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT TO PRETTY PRINT STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out.decode("utf-8"))

# SAVE AS FILE
xmlfile = open('Output.xml')
xmlfile.write(tree_out)
xmlfile.close()

Result (notice all above questions being handled below, except Baseball which was not present in posted data)

<?xml version='1.0' encoding='UTF-8'?>
<Title>
  <Playcount>newvalue</Playcount>
  <Genre>Adventure</Genre>
  <Genre>new</Genre>
</Title>


来源:https://stackoverflow.com/questions/35977447/python-elementtree-xml-modifying-elements-with-multiple-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!