Say I have a variable containing bytes:
>>> a = b'Hello World'
It can be verified with:
>>> type(a)
<class 'bytes'>
Now I try and convert a into a string with str()
:
>>> b = str(a)
and sure enough it is a string:
>>> type(b)
<class 'str'>
Now I try and print b
but I get a totally unexpected result:
>>> print(b)
b'Hello World'
It returns a string, as I would expect but also it keeps the b
(byte symbol) and the '
(quotation marks).
Why does it do this, and not just print the message between the quotation marks?
Don't think of a bytes
value as a string in some default 8-bit encoding. It's just binary data. As such, str(a)
returns an encoding-agnostic string to represent the value of the byte string. If you want 'Hello World'
, be specific and decode the value.
>>> b = a.decode()
>>> type(b)
>>> str
>>> print(b)
Hello World
In Python 2, the distinction between bytes and text was blurred. Python 3 went to great lengths to separate the two: bytes
for binary data, and str
for readable text.
For another perspective, compare
>>> list("Hello")
['H', 'e', 'l', 'l', 'o']
with
>>> list(b"Hello")
[72, 101, 108, 108, 111]
What str(b)
does here is convert bytes to a string by trying to call thing.__str__
, which fails because bytes have no __str__
and then falling back to __repr__
, which returns the string required to create this object in the repl.
If you think about it, just converting bytes
to a str
makes little sense, as you need to know the encoding. You can use bytes.decode(encoding)
to convert bytes
to str
properly.
b.decode("utf-8")
The encoding can also be left empty, in which case a default (likely utf-8) will be chosen.
str
usually transforms an object into a string that represents it. There is no better representation than b'contains' of a bytes object. You probably want to use decode
, where you also specify encoding of the bytes object, that should be used when transforming to string
In Python 3.x, when you type-cast byte string using str(s)
, it creates a new string as b'Hello World'
(keeping the "b"
denoting byte string at the start). It is because byte-string doesn't have a __str__
function defined. Hence, it makes the call to __repr__
which returns the same string which byte used for the representation of it's object values (i.e string preceded by "b"). For example:
>>> a = b'Hello World'
>>> str(a)
"b'Hello World'"
There are two ways to convert byte-like object to string. For example:
Decode byte-string to string: You can
decode
your byte-stringa
to string as:>>> a.decode() 'Hello World'
Convert byte-string to
utf-8
string as:>>> str(a, 'utf-8') 'Hello World'
来源:https://stackoverflow.com/questions/49681951/converting-bytes-to-string-with-str-returns-string-with-speech-marks