问题
I am writing a script that reads a csv file and uses the pandas library to create a pivot table.
I keep receiving an error ('utf-8' codec can't decode byte 0xf1 in position 6: invalid continuation byte) that I have linked back to the use of 'ñ' in one of the names in the csv file.
I have searched for hours trying to find a way to handle this. I have tried including the encoding type in my pandas.read_csv and have had no luck.
Here is my code:
df = pandas.read_csv(
os.path.join(wd,'Birthday_%s.csv' % datesuffix),
encoding='utf-8')
pivot = pandas.pivot_table(df,
index=['ClientID','ClientName','Branch'],
values=['EmailAddress'],
aggfunc='count',
margins=True)
pivotlocation = os.path.join(wd,'BirthdayPivot.csv')
pivot.to_csv(pivotlocation)
Any help would be hugely appreciated.
EDIT: Here is the line in question that is causing the issue.
ClientID | ClientName | Branch | Name | EmailAddress
5555 | ExampleClient | ExampleBranch | Avendaño | email@email.com
It is the name column (containing 'Avendaño') that seems to be causing the issues.
回答1:
The proper encoding may be 'latin-1' so you may want to consider:
df = pandas.read_csv(
os.path.join(wd,'Birthday_%s.csv' % datesuffix),
encoding='latin-1')
来源:https://stackoverflow.com/questions/28548411/error-handling-%c3%b1-in-pandas