I\'m using Python 2.5. What is going on here? What have I misunderstood? How can I fix it?
in.txt:
Stäckövérfløw
code.py>
Check this out:
# -*- coding: utf-8 -*-
import pprint
f = open('unicode.txt','r')
for line in f:
print line
pprint.pprint(line)
for i in line:
print i,
f.close()
It returns this:
Stäckövérfløw
'St\xc3\xa4ck\xc3\xb6v\xc3\xa9rfl\xc3\xb8w'
S t ? ? c k ? ? v ? ? r f l ? ? w
The thing is that the file is just being read as a string of bytes. Iterating over them splits the multibyte characters into nonsensical byte values.