Python: Convert Unicode to ASCII without errors for CSV file

二次信任 提交于 2019-11-29 10:43:57
Lennart Regebro

Correct, ñ is not a valid ASCII character, so you can't encode it to ASCII. So you can, as your code does above, ignore them. Another way, namely to remove the accents, you can find here: What is the best way to remove accents in a Python unicode string?

But note that both techniques can result in bad effects, like making words actually mean something different, etc. So the best is to keep the accents. And then you can't use ASCII, but you can use another encoding. UTF-8 is the safe bet. Latin-1 or ISO-88591-1 is common one, but it includes only Western European characters. CP-1252 is common on Windows, etc, etc.

So just switch "ascii" for whatever encoding you want.


Your actual code, according to your comment is:

writer.writerow([s.encode('utf8') if type(s) is unicode else s for s in row]) 

where

row = (56, u"LIMPIADOR BA\xd1O 1'5 L")

Now, I believe that should work, but apparently it doesn't. I think unicode gets passed into the cvs writer by mistake anyway. Unwrap that long line to it's parts:

col1, col2 = row # Use the names of what is actually there instead
row = col1, col2.encode('utf8')
writer.writerow(row) 

Now your real error will not be hidden by the fact that you stick everything in the same line. This could also probably have been avoided if you had included a proper traceback.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!