问题
I'm attempting to output the result into a pandas data frame. When I print the data frame, the object values appear correct, but when I use the to_csv function on the data frame, my csv output has only the first character for every string/object value.
df = pandas.DataFrame({'a':[u'u\x00s\x00']})
df.to_csv('test.csv')
I've also tried the following addition to the to_csv function:
df.to_csv('test_encoded.csv', encoding= 'utf-8')
But am getting the same results:
>>> print df
a
0 us
(output in csv file)
u
For reference, I'm connecting to a Vertica database and using the following setup:
- OS: Mac OS X Yosemite (10.10.5)
- Python 2.7.10 |Anaconda 2.3.0 (x86_64)| (default, Sep 15 2015, 14:29:08)
- pyodbc 3.0.10
- pandas 0.16.2
- ODBC: Vertica ODBC 6.1.3
Any help figuring out how to pass the entire object string using the to_csv function in pandas would be greatly appreciated.
回答1:
I was facing the same problem and found this post UTF-32 in Python
To fix your problem, I believe that you need to replace all '\x00' by empty. I managed to write the correct CSV with the code below
fixer = dict.fromkeys([0x00], u'')
df['a'] = df['a'].map(lambda x: x.translate(fixer))
df.to_csv('test.csv')
To solve my problem with Vertica I had to change the encoding to UTF-16 in the file /Library/Vertica/ODBC/lib/vertica.ini with the configuration below
[Driver]
ErrorMessagesPath=/Library/Vertica/ODBC/messages/
ODBCInstLib=/usr/lib/libiodbcinst.dylib
DriverManagerEncoding=UTF-16
Best regards,
Anderson Neves
来源:https://stackoverflow.com/questions/32703664/python-pandas-to-csv-output-returns-single-character-for-string-object-values