pandas to_csv: ascii can't encode character

后端未结

关注

 4  2063

I\'m trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ç, ñ, etc.). But it breaks when I try to write out the a

相关标签:

4条回答

梦毁少年i

2020-12-15 08:28
You have some characters that are not ASCII and therefore cannot be encoded as you are trying to do. I would just use utf-8 as suggested in a comment.

To check which lines are causing the issue you can try something like this:
```
def is_not_ascii(string):
    return string is not None and any([ord(s) >= 128 for s in string])

df[df[col].apply(is_not_ascii)]
```
You'll need to specify the column col you are testing.
0 讨论(0)
发布评论:

提交评论
- 加载中...
逝去的感伤

2020-12-15 08:46

Another solution is to use string functions encode/decode with the 'ignore' option, but it will remove non-ascii characters:

df['text'] = df['text'].apply(lambda x: x.encode('ascii', 'ignore').decode('ascii'))

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2020-12-15 08:48
Check the answer here

It's a much simpler solution:
```
newdf.to_csv('filename.csv', encoding='utf-8')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-12-15 08:54

Try this, it works

newdf.to_csv('filename.csv', encoding='utf-8')

0 讨论(0)
发布评论:

提交评论
- 加载中...