I have a CSV text file encoded in UTF-16 (so as to preserve Unicode characters when others use Excel) but when doing a read_csv with Pandas 0.9.0, I get this cryptic error:<
This is a bug, I think because csv reader was passing back an extra empty line in the beginning. It worked for me on Python 2.7.3 and pandas 0.9.1 if I do:
In [36]: pd.read_csv(BytesIO(fh.read().decode('UTF-16').encode('UTF-8')), sep='\t', header=0)
Out[36]:
Int64Index: 50 entries, 0 to 49
Data columns:
Country 43 non-null values
State/City 43 non-null values
Title 43 non-null values
Date 43 non-null values
Catalogue 43 non-null values
Wikipedia Election Page 43 non-null values
Wikipedia Individual Page 43 non-null values
Electoral Institution in Country 43 non-null values
Twitter 43 non-null values
CANDIDATE NAME 1 43 non-null values
CANDIDATE NAME 2 16 non-null values
dtypes: object(11)
I reported the bug here: https://github.com/pydata/pandas/issues/2418 On github master it unfortunately causes a segfault in the c-parser. We'll fix it.
Now, interestingly: https://softwareengineering.stackexchange.com/questions/102205/should-utf-16-be-considered-harmful ;)