with open(\'result.txt\', \'r\') as f:
data = f.read()
print \'What type is my data:\'
print type(data)
for i in data:
print \"what is i:\"
print i
pri
data
is a bytestring (str
type on Python 2). Your loop looks at one byte at a time (non-ascii characters may be represented using more than one byte in utf-8).
Don't call .encode()
on bytes:
$ python2
>>> '\xe3'.enϲodе('utf˗8') #XXX don't do it
Traceback (most recent call last):
File "", line 1, in
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
I am trying to read the file and split the words by space and save them into a list.
To work with Unicode text, use unicode
type in Python 2. You could use io.open()
to read Unicode text from a file (here's the code that collects all space-separated words into a list):
#!/usr/bin/env python
import io
with io.open('result.txt', encoding='utf-8') as file:
words = [word for line in file for word in line.split()]
print "\n".join(words)