Converting bytes to string with str() returns string with speech marks

↘锁芯ラ 提交于 2019-12-04 05:04:30

问题


Say I have a variable containing bytes:

>>> a = b'Hello World'

It can be verified with:

>>> type(a)
<class 'bytes'>

Now I try and convert a into a string with str():

>>> b = str(a)

and sure enough it is a string:

>>> type(b)
<class 'str'>

Now I try and print b but I get a totally unexpected result:

>>> print(b)
b'Hello World'

It returns a string, as I would expect but also it keeps the b (byte symbol) and the ' (quotation marks).

Why does it do this, and not just print the message between the quotation marks?


回答1:


Don't think of a bytes value as a string in some default 8-bit encoding. It's just binary data. As such, str(a) returns an encoding-agnostic string to represent the value of the byte string. If you want 'Hello World', be specific and decode the value.

>>> b = a.decode()
>>> type(b)
>>> str
>>> print(b)
Hello World

In Python 2, the distinction between bytes and text was blurred. Python 3 went to great lengths to separate the two: bytes for binary data, and str for readable text.

For another perspective, compare

>>> list("Hello")
['H', 'e', 'l', 'l', 'o']

with

>>> list(b"Hello")
[72, 101, 108, 108, 111]



回答2:


What str(b) does here is convert bytes to a string by trying to call thing.__str__, which fails because bytes have no __str__ and then falling back to __repr__, which returns the string required to create this object in the repl.

If you think about it, just converting bytes to a str makes little sense, as you need to know the encoding. You can use bytes.decode(encoding) to convert bytes to str properly.

b.decode("utf-8")

The encoding can also be left empty, in which case a default (likely utf-8) will be chosen.




回答3:


str usually transforms an object into a string that represents it. There is no better representation than b'contains' of a bytes object. You probably want to use decode, where you also specify encoding of the bytes object, that should be used when transforming to string




回答4:


In Python 3.x, when you type-cast byte string using str(s), it creates a new string as b'Hello World' (keeping the "b" denoting byte string at the start). It is because byte-string doesn't have a __str__ function defined. Hence, it makes the call to __repr__ which returns the same string which byte used for the representation of it's object values (i.e string preceded by "b"). For example:

>>> a = b'Hello World'
>>> str(a)
"b'Hello World'"

There are two ways to convert byte-like object to string. For example:

  1. Decode byte-string to string: You can decode your byte-string a to string as:

    >>> a.decode()
    'Hello World'
    
  2. Convert byte-string to utf-8 string as:

    >>> str(a, 'utf-8')
    'Hello World'
    


来源:https://stackoverflow.com/questions/49681951/converting-bytes-to-string-with-str-returns-string-with-speech-marks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!