Handle wrongly encoded character in Python unicode string

前端 未结 5 598
暗喜
暗喜 2020-12-06 10:01

I am dealing with unicode strings returned by the python-lastfm library.

I assume somewhere on the way, the library gets the encoding wrong and returns a unicode str

5条回答
  •  没有蜡笔的小新
    2020-12-06 10:44

    I stumble upon this bug myself while processing a file containing german words that I was unaware it has been encoded in UTF-8. The problem manifest itself when I start processing words and some of them would't show the decoding error.

    # python
    Python 2.7.12 (default, Aug 22 2019, 16:36:40) 
    >>> utf8_word = u"Gl\xfcck"
    >>> print("Word read was: {}".format(utf8_word))
    Traceback (most recent call last):
      File "", line 1, in 
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)
    

    I solve the error calling the encode method on the string:

    >>> print("Word read was: {}".format(utf8_word.encode('utf-8')))
    Word read was: Glück
    

提交回复
热议问题