Error handling ñ in pandas

纵然是瞬间 提交于 2021-02-08 07:22:06

问题


I am writing a script that reads a csv file and uses the pandas library to create a pivot table.

I keep receiving an error ('utf-8' codec can't decode byte 0xf1 in position 6: invalid continuation byte) that I have linked back to the use of 'ñ' in one of the names in the csv file.

I have searched for hours trying to find a way to handle this. I have tried including the encoding type in my pandas.read_csv and have had no luck.

Here is my code:

df = pandas.read_csv(
            os.path.join(wd,'Birthday_%s.csv' % datesuffix),
            encoding='utf-8')
pivot = pandas.pivot_table(df,
            index=['ClientID','ClientName','Branch'],
            values=['EmailAddress'],
            aggfunc='count',
            margins=True)
pivotlocation = os.path.join(wd,'BirthdayPivot.csv')
pivot.to_csv(pivotlocation)

Any help would be hugely appreciated.

EDIT: Here is the line in question that is causing the issue.

ClientID | ClientName    | Branch        | Name     | EmailAddress
5555     | ExampleClient | ExampleBranch | Avendaño | email@email.com

It is the name column (containing 'Avendaño') that seems to be causing the issues.


回答1:


The proper encoding may be 'latin-1' so you may want to consider:

df = pandas.read_csv(
            os.path.join(wd,'Birthday_%s.csv' % datesuffix),
            encoding='latin-1')


来源:https://stackoverflow.com/questions/28548411/error-handling-%c3%b1-in-pandas

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!