Python3: Decode UTF-8 bytes converted as string

☆樱花仙子☆ 提交于 2020-01-06 05:06:22

问题


Suppose I have something like:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a)

which returns string in form:

b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'

Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear.


回答1:


If you want to encode and decode text, that's what the encode and decode methods are for:

>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'

Also, notice that UTF-8 is already the default, so you can just do this:

>>> b = a.encode()
>>> c = b.decode()

The only reason you need to specify arguments is:

  • You need to use some other encoding instead of UTF-8,
  • You need to specify a specific error handler, like 'surrogatereplace' instead of 'strict', or
  • Your code has to run in Python 3.0-3.1 (which almost nobody used).

However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str call, just as you did in the bytes call:

>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c

Calling str on a bytes object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes on a str without an encoding, because the main job of str is to give you a string representation of the object—and the best string representation of a bytes object is that b'…'.




回答2:


I found it. The simplest way to convert string representation of bytes to bytes again is through the eval statement:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a) #this is the input we deal with

a = eval(a) #that's how we transform a into bytes
a = str(a, 'utf-8') #...and now we convert it into string

print(a)


来源:https://stackoverflow.com/questions/51200908/python3-decode-utf-8-bytes-converted-as-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!