Python strings and str() method encoding and decoding

最后都变了- 提交于 2019-12-10 10:59:05

问题


I see that the Python manual mentions .encode() and .decode() string methods. Playing around on the Python CLI I see that I can create unicode strings u'hello' with a different datatype than a 'regular' string 'hello' and can convert / cast with str(). But the real problems start when using characters above ASCII 127 u'שלום' and I am having a hard time determining empirically exactly what is happening.

Stack Overflow is overflowing with examples of confusion regarding Python's unicode and string-encoding/decoding handling.

What exactly happens (how are the bytes changed, and how is the datatype changed) when encoding and decoding strings with the str() method, especially when characters that cannot be represented in 7 bytes are included in the string? Is it true, as it seems, that a Python variable with datatype <type 'str'> can be both encoded and decoded? If it is encoded, I understand that means that the string is represented by UTF-8, ISO-8859-1, or some other encoding, is this correct? If it is decoded, what does this mean? Are decoded strings unicode? If so, then why don't they have the datatype <type 'unicode'>?

In the interest of those who will read this later, I think that both Python 2 and Python 3 should be addressed. Thank you!


回答1:


This is only the case in Python 2. The existence of a decode method on Python 2's strings is a wart, which has been changed in Python 3 (where the equivalent, bytes, has only decode).

You can't 'encode' an already-encoded string. What happens when you do call encode on a str is that Python implicitly calls decode on it using the default encoding, which is usually ASCII. This is almost always not what you want. You should always call decode to convert a str to unicode before converting it to a different encoding.

(And decoded strings are unicode, and they do have type <unicode>, so I don't know what you mean by that question.)

In Python 3 of course strings are unicode by default. You can only encode them to bytes - which, as I mention above, can only be decoded.



来源:https://stackoverflow.com/questions/17063502/python-strings-and-str-method-encoding-and-decoding

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!