Decoding double encoded utf8 in Python

后端 未结 3 2003
野性不改
野性不改 2020-12-05 05:04

I\'ve got a problem with strings that I get from one of my clients over xmlrpc. He sends me utf8 strings that are encoded twice :( so when I get them in python I have an uni

3条回答
  •  臣服心动
    2020-12-05 05:53

    >>> weird = u'Rafa\xc5\x82'
    >>> weird.encode('latin1').decode('utf8')
    u'Rafa\u0142'
    >>>
    

    latin1 is just an abbreviation for Richie's nuts'n'bolts method.

    It is very curious that the seriously under-described raw_unicode_escape codec gives the same result as latin1 in this case. Do they always give the same result? If so, why have such a codec? If not, it would preferable to know for sure exactly how the OP's client did the transformation from 'Rafa\xc5\x82' to u'Rafa\xc5\x82' and then to reverse that process exactly -- otherwise we might come unstuck if different data crops up before the double encoding is fixed.

提交回复
热议问题