Handle wrongly encoded character in Python unicode string

前端 未结 5 596
暗喜
暗喜 2020-12-06 10:01

I am dealing with unicode strings returned by the python-lastfm library.

I assume somewhere on the way, the library gets the encoding wrong and returns a unicode str

5条回答
  •  甜味超标
    2020-12-06 10:47

    Your unicode string is fine:

    >>> unicodedata.name(u"\xfc")
    'LATIN SMALL LETTER U WITH DIAERESIS'
    

    The problem you see at the interactive prompt is that the interpreter doesn't know what encoding to use to output the string to your terminal, so it falls back to the "ascii" codec -- but that codec only knows how to deal with ASCII characters. It works fine on my machine (because sys.stdout.encoding is "UTF-8" for me -- likely because something like my environment variable settings differ from yours)

    >>> print u'Gl\xfcck'
    Glück
    

提交回复
热议问题