问题
I try to generate .xml files fith cyrillic symbols within. But result is unexpected. What is the simplest way to avoid this result? Example:
from lxml import etree
root = etree.Element('пример')
print(etree.tostring(root))
What I get is:
b'<пример/>'
Istead of:
b'<пример/>'
回答1:
etree.tostring()
without additional arguments outputs ASCII-only data as a bytes
object. You could use etree.tounicode():
>>> from lxml import etree
>>> root = etree.Element('пример')
>>> print(etree.tostring(root))
b'<пример/>'
>>> print(etree.tounicode(root))
<пример/>
or specify a codec with the encoding argument; you'd still get bytes however, so the output would need to be decoded again:
>>> print(etree.tostring(root, encoding='utf8'))
b'<\xd0\xbf\xd1\x80\xd0\xb8\xd0\xbc\xd0\xb5\xd1\x80/>'
>>> print(etree.tostring(root, encoding='utf8').decode('utf8'))
<пример/>
Setting the encoding to unicode
gives you the same output tounicode()
produces, and is the preferred spelling:
>>> print(etree.tostring(root, encoding='unicode'))
<пример/>
来源:https://stackoverflow.com/questions/29750592/what-is-right-way-to-use-cyrillic-in-python-lxml-library