UTF-8 problem in python when reading chars

后端未结

关注

 5  1745

温柔的废话 2021-02-06 06:58

I\'m using Python 2.5. What is going on here? What have I misunderstood? How can I fix it?

in.txt:

Stäckövérfløw

code.py

5条回答

粉色の甜心 (楼主)

2021-02-06 07:22

Check this out:

# -*- coding: utf-8 -*- import pprint f = open('unicode.txt','r') for line in f: print line pprint.pprint(line) for i in line: print i, f.close()

It returns this:

Stäckövérfløw
'St\xc3\xa4ck\xc3\xb6v\xc3\xa9rfl\xc3\xb8w'
S t ? ? c k ? ? v ? ? r f l ? ? w

The thing is that the file is just being read as a string of bytes. Iterating over them splits the multibyte characters into nonsensical byte values.

0 讨论(0)

查看其它5个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复