问题
I am new to python and trying to modify some xml configuration files which are present in my local system.
Input: I have an xml file(say Test.xml) with the following content.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<Domain>
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<SocketTimeout>5000</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
</Domain>
</JavaHost>
WHAT I WANT TO ACHIEVE: I want to achieve below 2 things:
Part 1: I want to modify value of SocketTimeout tag(only under composer tag) to 60 and also want to add a comment like this (foe e.g. Changed this value to reduce SocketTimeout). Hence the file Test.xml should be as below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<!-- Changed this value to reduce SocketTimeout -->
<SocketTimeout>60</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
</Domain>
</JavaHost>
Part 2: In the file Test.xml, I want to add a new tag under Domain tag as below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<JavaHost xmlns="SomeInfo/v1.1">
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- comment should not be removed and all formatting should be untouched -->
<Composer>
<!-- Changed this value to reduce SocketTimeout -->
<SocketTimeout>60</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
<New_tag>
<!-- New Tag -->
<Enabled>true</Enabled>
</New_tag>
</Domain>
</JavaHost>
That’s all I want :)
WHAT I HAVE TRIED:
To achieve this task I considered below optons:
Minidom/ElementTree/lxml removes comments in the file and also changes the formatting of the file.
Regex: Doesn’t removes comments, also doesn’t disturb formatting. Hence, I opted for regex and below is what I started with, but is not working :(
import os, re
# set the working directory
os.chdir('C:\\Users\\Dell\\Desktop\\')
# open the source file and read it
fh = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'r')
subject = fh.read()
fh.close()
pattern = re.compile(r"\[<Composer>\].*?\[/<Composer>\]")
#Replace
result = pattern.sub(lambda match: match.group(0).replace('<SocketTimeout>500</SocketTimeout>','<SocketTimeout>60</SocketTimeout>') ,subject)
# write the file
f_out = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'w')
f_out.write(result)
f_out.close()
Any idea in implementing what I want or rectification in mistakes would be highly appreciable. Although I am new to python but will try my best to work on the suggestions.
回答1:
This is not exactly what you wanted but it's close. For one thing, avoid regex for xml, html and similar processing like the plague. At the same time, don't be surprised if you find occasional 'challenges' in using products like lxml.
I think, this time, I found a bug.
from lxml import etree
tree = etree.parse('shivam.xml')
element_to_change = tree.xpath('.//Composer/SocketTimeout')[0]
print(element_to_change)
element_to_change.text='60'
comment_will_follow_this = tree.xpath('.//Composer')[0]
print(comment_will_follow_this)
comment = etree.Comment('This did not work')
comment_will_follow_this.append(comment)
comment = etree.Comment('Changed this value to reduce SocketTimeout')
element_to_change.addprevious(comment)
tree.write('see_it.xml', pretty_print=True)
- I used
xpath
to find the element to change, and the places in the file to receive the comments. - The
append
method is supposed to add a comment or other element to a given element as a child. However, I found in this case that the 'This did not work' comment was added as a preceding element comment. - However, I did find that
addprevious
was able to add the comment in the desired location, the fly in the ointment being that it fails to place an end-line between the comment and the next xml element.
Here's the resulting file. Indicidentally, you will note that the original comments are intact.
<JavaHost>
<Domain>
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<!--Changed this value to reduce SocketTimeout--><SocketTimeout>60</SocketTimeout>
<Enabled>true</Enabled>
<!--This did not work--></Composer>
</Domain>
</JavaHost>
回答2:
Since you used modify and XML in same sentence, consider XSLT, the special-purpose language designed to modify XML files. Python's lxml
can run XSLT 1.0 scripts as well as external processors or other languages that Python can call at command line. So, XSLT is portable! Even more, Python can pass parameters to XSLT in case 50 needs to be dynamically adjusted -very similar to parameters in the other special-purpose language, SQL, of which Python has many APIs.
Specifically, XSLT maintains the <xsl:comment>
command and can append or rewrite nodes to trees. Also, as commented, linked, and hopefully web search recommended, it is highly ill-adivsed to use regex on X|HTML documents being non-natural languages. Hence, DOM libraries like Python's etree, lxml, minidom are preferred, of course XSLT too that adheres to W3C standards.
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|@*|comment()">
<xsl:copy>
<xsl:apply-templates select="node()|@*|comment()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Domain">
<xsl:copy>
<xsl:apply-templates select="*|@*|comment()"/>
<New_tag>
<xsl:comment>New Tag</xsl:comment>
<Enabled>true</Enabled>
</New_tag>
</xsl:copy>
</xsl:template>
<xsl:template match="Composer">
<xsl:copy>
<xsl:comment>Changed this value to reduce SocketTimeout</xsl:comment>
<SocketTimeout>50</SocketTimeout>
<xsl:apply-templates select="Enabled"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
import lxml.etree as et
# LOAD XML AND XSLT
dom = et.parse('Input.xml')
xslt = et.parse('XSLT_Script.xsl')
# TRANSFORM SOURCE
transform = et.XSLT(xslt)
newdom = transform(dom)
# OUTPUT TO CONSOLE
print(newdom)
# OUTPUT TO FILE
with open('Output.xml', 'wb') as f:
f.write(newdom)
Output
<JavaHost>
<Domain>
<MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<SocketTimeout>500</SocketTimeout>
</MessageProcessor>
<!-- This comment should not be removed and all formating should be untouched -->
<Composer>
<!--Changed this value to reduce SocketTimeout-->
<SocketTimeout>50</SocketTimeout>
<Enabled>true</Enabled>
</Composer>
<New_tag>
<!--New Tag-->
<Enabled>true</Enabled>
</New_tag>
</Domain>
</JavaHost>
来源:https://stackoverflow.com/questions/48976701/editing-local-xml-file-using-python-and-regular-expression