I am trying to read a CSV file with accented characters with Python (only French and/or Spanish characters). Based on the Python 2.5 documentation for the csvreader (http://
Looking at the Latin-1 unicode table, I see the character code 00E9
"LATIN SMALL LETTER E WITH ACUTE". This is the accented character in your sample data. A simple test in Python
shows that UTF-8
encoding for this character is different from the unicode (almost UTF-16
) encoding.
>>> u'\u00e9'
u'\xe9'
>>> u'\u00e9'.encode('utf-8')
'\xc3\xa9'
>>>
I suggest you try to encode("UTF-8")
the unicode data before calling the special unicode_csv_reader()
.
Simply reading the data from a file might hide the encoding, so check the actual character values.