UTF-8 problem in python when reading chars

后端 未结 5 1746
温柔的废话
温柔的废话 2021-02-06 06:58

I\'m using Python 2.5. What is going on here? What have I misunderstood? How can I fix it?

in.txt:

Stäckövérfløw

code.py

5条回答
  •  感动是毒
    2021-02-06 07:38

    for i in line:
        print i,
    

    When you read the file, the string you read in is a string of bytes. The for loop iterates over a single byte at a time. This causes problems with a UTF-8 encoded string, where non-ASCII characters are represented by multiple bytes. If you want to work with Unicode objects, where the characters are the basic pieces, you should use

    import codecs
    f = codecs.open('in', 'r', 'utf8')
    

    If sys.stdout doesn't already have the appropriate encoding set, you may have to wrap it:

    sys.stdout = codecs.getwriter('utf8')(sys.stdout)
    

提交回复
热议问题