Special caracters don't display correctly when splitting

主宰稳场 提交于 2019-12-12 05:59:17

问题


When I'm reading a line in a text file, like this one below :

présenté alloué ééé ààà tué

And try to print it in the terminal, it displays correctly. But when I apply a split with a space as separator, it displays this :

['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9\n']

I just use this to read the text file :

f = open("test.txt")
l = f.readline()
f.close()
print l.split(" ")

Can someone help me ?


回答1:


Printing the list is not the same as printing its elements

s = "présenté alloué ééé ààà tué"
print s.split(" ")
for x in s.split(" "):
    print x

Output:

['pr\xc3\xa9sent\xc3\xa9', 'allou\xc3\xa9', '\xc3\xa9\xc3\xa9\xc3\xa9', '\xc3\xa0\xc3\xa0\xc3\xa0', 'tu\xc3\xa9']
présenté
alloué
ééé
ààà
tué



回答2:


Python 3.* solution: All you have to do is to specify the encoding you wish to use

f = open("test.txt", encoding='utf-8')
l = f.readline()
f.close()
print(l.split(" "))

And you'll get

['présenté', 'alloué', 'ééé', 'ààà', 'tué']

Python 2.* solution:

import codecs

f = codecs.open("""D:\Source Code\\voc-git\\test.txt""", mode='r', encoding='utf-8')
l = f.read()
f.close()
for word in l.split(" "):
    print(word)


来源:https://stackoverflow.com/questions/42109285/special-caracters-dont-display-correctly-when-splitting

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!