French and lxml text

不打扰是莪最后的温柔 提交于 2020-01-06 06:07:06

问题


I'm trying to assign a valid French text string to a text string using lxml:

el = etree.Element("someelement")
el.text = 'Disponible à partir du 1er Octobre'

I get the error:

ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

I've also tried:

el.ext = etree.CDATA('Disponible à partir du 1er Octobre')

However I get the same error.

How do I handle French in XML, in particular, ISO-8859-1? There are ways to specify encoding within the tostring() function in lxml, but not for assigning text values within elements.


回答1:


If text contains non-ascii data then you should provide it as a Unicode string for el.text.

As @Abbasov Alexander's answer shows you could do it using a Unicode literal u''. Python hasn't raise an exception so I assume that you've declared a character encoding of your Python source file (e.g., using # coding: utf-8 comment at the top). This encoding defines how Python interprets non-ascii characters in the source, it is unrelated to the encoding you use to save xml to a file.

If the text is already in a variable and you haven't converted it to Unicode yet, you could do it using text.decode(text_encoding) (text_encoding may be unrelated to the Python source encoding).

The confusing bit might be that el.text (as an optimization) returns a bytestring on Python 2 for pure ascii data. It breaks the rule that you should not mix bytes and Unicode strings. Though It should work if sys.getdefaultencoding() returns an ascii-based encoding as it does in most cases.

To save xml, pass any character encoding you need totostring() or ElementTree.write() functions. Again, this encoding is unrelated to others already mentioned encodings.

In general, use Unicode sandwich: decode bytes to Unicode as soon as you receive them, work with Unicode text inside your program, encode to bytes as late as possible when you need to send the text using API that doesn't support Unicode (files, network).




回答2:


If you have version of python < 3 you can try: el.text = u'Disponible à partir du 1er Octobre'



来源:https://stackoverflow.com/questions/46870889/python-lxml-valueerror-all-strings-must-be-xml-compatible

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!