Write xml utf-8 file with utf-8 data with ElementTree

你。 提交于 2019-11-29 12:36:13

问题


I'm trying to write an xml file with utf-8 encoded data using ElementTree like this:

#!/usr/bin/python                                                                       
# -*- coding: utf-8 -*-                                                                   

import xml.etree.ElementTree as ET
import codecs

testtag = ET.Element('unicodetag')
testtag.text = u'Töreboda' #The o is really ö (o with two dots over). No idea why SO dont display this
expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
expfile.close()

This blows up with the error

Traceback (most recent call last):
  File "unicodetest.py", line 10, in <module>
    ET.ElementTree(testtag).write(expfile,encoding="UTF-8",xml_declaration=True)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 815, in write
    serialize(write, self._root, encoding, qnames, namespaces)    
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 932, in _serialize_xml
    write(_escape_cdata(text, encoding))
  File "/usr/lib/python2.7/codecs.py", line 691, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Using the "us-ascii" encoding instead works fine, but don't preserve the unicode characters in the data. What is happening?


回答1:


codecs.open expects Unicode strings to be written to the file object and it will handle encoding to UTF-8. ElementTree's write encodes the Unicode strings to UTF-8 byte strings before sending them to the file object. Since the file object wants Unicode strings, it is coercing the byte string back to Unicode using the default ascii codec and causing the UnicodeDecodeError.

Just do this:

#expfile = codecs.open('testunicode.xml',"w","utf-8-sig")
ET.ElementTree(testtag).write('testunicode.xml',encoding="UTF-8",xml_declaration=True)
#expfile.close()


来源:https://stackoverflow.com/questions/10046755/write-xml-utf-8-file-with-utf-8-data-with-elementtree

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!