Encoding errors with StringIO and read_csv pandas

荒凉一梦 提交于 2020-01-03 05:29:08

问题


I am using an API to get some data. The data returned is in Unicode (not a dictionary / json object).

get data

data = []
for urls in api_call_list:
    data.append(requests.get(urls))
the data looks like this:

>>> data[0].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Taylor Swift;33100;0.83;0.20\r\n'

>>> data[1].text
u'Country;Celebrity;Song Volume;CPP;Index\r\nus;Rihanna;28100;0.76;0.33\r\n'

I use this code to convert this to a dataframe:

from io import StringIO     
import pandas as pd

pd.concat([pd.read_csv(StringIO(d.text), sep = ";") for d in data])

Works just fine except when there are non-english characters involved in the results, specially, Korean, Chinese or Japanese. It completely garbles them. I tried adding the encoding argument to read_csv with utf_8, cp1252 and iso-8859-1 as values. None of these worked.

How should i read this data correctly?


回答1:


After some analysis and research , I was able to identify the problem. The unicode returned by the API was decoded or did not have the correct encoding but it can be set. So what i did is added a line to set the encoding for the payload from requests.

data = []
for urls in api_call_list:
    r = requests.get(urls)
    r.encoding = 'utf-8'
    data.append(r)

and then added encoding argument to read_csv :

pd.concat([pd.read_csv(StringIO(d.text), sep = ";", encoding='utf-8') for d in data])

that set it right. the documentation is here: http://docs.python-requests.org/en/master/user/quickstart/



来源:https://stackoverflow.com/questions/43598020/encoding-errors-with-stringio-and-read-csv-pandas

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!