how combine 'utf-8' and 'unicode_escape' to correctly decode b'\xc3\xa4\\n-\\t-\\“foo\\”'?

问题

I have a library that gives me encoded and escaped byte sequences like this one:

a=b'\xc3\xa4\\n-\\t-\\"foo\\"'

Which I want to translate back to:

ä
-   -"foo"

I tried to just .decode a which decodes the sequence as wanted:

>>> a.decode()
'ä\\n-\\t-\\"foo\\"'

But it does not un-escape. Then I found 'unicode_escape' and I got

>>> print(a.decode('unicode_escape'))
Ã¤
-   -"foo"

Is there a way to decode and unescape the given sequence with a builtin method (i.e. without having to .replace('\\n', '\n').replace(...))?

It would be also interesting to know how I can revert this operation (i.e. getting the same byte sequence from the translated result).

回答1:

There is a way to somehow do what I want and I can almost go the other way, too but in my eyes it's ugly and incomplete, so I hope it's not the best option I have:

>>> import codecs
>>> decoded = codecs.escape_decode(a)[0].decode()
>>> print(decoded)
ä
-   -"foo"
>>> reencoded = codecs.escape_encode(decoded.encode())
>>> print(reencoded)
(b'\\xc3\\xa4\\n-\\t-"foo"', 11)      <--- qotes are note escaped

来源：https://stackoverflow.com/questions/41160817/how-combine-utf-8-and-unicode-escape-to-correctly-decode-b-xc3-xa4-n-t

标签

string

python-3.x

encoding

escaping

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!