问题
I have a library that gives me encoded and escaped byte sequences like this one:
a=b'\xc3\xa4\\n-\\t-\\"foo\\"'
Which I want to translate back to:
ä
- -"foo"
I tried to just .decode
a which decodes the sequence as wanted:
>>> a.decode()
'ä\\n-\\t-\\"foo\\"'
But it does not un-escape. Then I found 'unicode_escape'
and I got
>>> print(a.decode('unicode_escape'))
ä
- -"foo"
Is there a way to decode and unescape the given sequence with a builtin method (i.e. without having to .replace('\\n', '\n').replace(...)
)?
It would be also interesting to know how I can revert this operation (i.e. getting the same byte sequence from the translated result).
回答1:
There is a way to somehow do what I want and I can almost go the other way, too but in my eyes it's ugly and incomplete, so I hope it's not the best option I have:
>>> import codecs
>>> decoded = codecs.escape_decode(a)[0].decode()
>>> print(decoded)
ä
- -"foo"
>>> reencoded = codecs.escape_encode(decoded.encode())
>>> print(reencoded)
(b'\\xc3\\xa4\\n-\\t-"foo"', 11) <--- qotes are note escaped
来源:https://stackoverflow.com/questions/41160817/how-combine-utf-8-and-unicode-escape-to-correctly-decode-b-xc3-xa4-n-t