UnicodeDecodeError, invalid continuation byte

前端 未结 10 2198
忘掉有多难
忘掉有多难 2020-11-22 08:25

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of \\xe9 char" #I want this to remain a string as thi         


        
10条回答
  •  忘掉有多难
    2020-11-22 08:58

    It is invalid UTF-8. That character is the e-acute character in ISO-Latin1, which is why it succeeds with that codeset.

    If you don't know the codeset you're receiving strings in, you're in a bit of trouble. It would be best if a single codeset (hopefully UTF-8) would be chosen for your protocol/application and then you'd just reject ones that didn't decode.

    If you can't do that, you'll need heuristics.

提交回复
热议问题