I have tried to use the answer in this question, but can\'t make it work: How to create "virtual root" with Python's ElementTree?
Here\'s my code:
I couldn't find a solution to this problem either using vanilla ElementTree, and the solution proposed by demalexx created non-valid XML that was rejected by my application (DITA). What I propose is a workaround involving other modules and it works perfectly for me.
import re
# found no way for cleanly specify a <!DOCTYPE ...> stanza in ElementTree so
# so we substitute the current <?xml ... ?> stanza with a full <?xml... + <!DOCTYPE...
new_header = '<?xml version="1.0" encoding="UTF-8" ?>\n' \
'<!DOCTYPE topic PUBLIC "-//OASIS//DTD DITA Topic//EN" "topic.dtd">\n'
target_xml = re.sub(u"\<\?xml .+?>", new_header, source_xml)
with open(filename, 'w') as catalog_file:
catalog_file.write(target_xml.encode('utf8'))
You could set xml_declaration argument on write function to False, so output won't have xml declaration with encoding, then just append what header you need manually. Actually if you set your encoding as 'utf-8' (lowercase), xml declaration won't be added too.
import xml.etree.cElementTree as ElementTree
tree = ElementTree.Element('tmx', {'version': '1.4a'})
ElementTree.SubElement(tree, 'header', {'adminlang': 'EN'})
ElementTree.SubElement(tree, 'body')
with open('myfile.tmx', 'wb') as f:
f.write('<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE tmx SYSTEM "tmx14a.dtd">'.encode('utf8'))
ElementTree.ElementTree(tree).write(f, 'utf-8')
Resulting file (newlines added manually for readability):
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN" />
<body />
</tmx>
I used different solution to add DOCTYPE, very simple, very stupid.
import xml.etree.ElementTree as ET
with open(path_file, "w", encoding='UTF-8') as xf:
doc_type = '<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE dlg:window ' \
'PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "dialog.dtd">'
tostring = ET.tostring(root).decode('utf-8')
file = f"{doc_type}{tostring}"
xf.write(file)
You could use lxml and its tostring function:
from lxml import etree
s = """<?xml version="1.0" encoding="UTF-8"?>
<tmx version="1.4a"/>"""
tree = etree.fromstring(s)
header = etree.SubElement(tree,'header',{'adminlang': 'EN'})
body = etree.SubElement(tree,'body')
print etree.tostring(tree, encoding="UTF-8",
xml_declaration=True,
pretty_print=True,
doctype='<!DOCTYPE tmx SYSTEM "tmx14a.dtd">')
=>
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE tmx SYSTEM "tmx14a.dtd">
<tmx version="1.4a">
<header adminlang="EN"/>
<body/>
</tmx>