pandas to_csv: ascii can't encode character

后端 未结 4 2061
悲哀的现实
悲哀的现实 2020-12-15 07:50

I\'m trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ç, ñ, etc.). But it breaks when I try to write out the a

相关标签:
4条回答
  • 2020-12-15 08:28

    You have some characters that are not ASCII and therefore cannot be encoded as you are trying to do. I would just use utf-8 as suggested in a comment.

    To check which lines are causing the issue you can try something like this:

    def is_not_ascii(string):
        return string is not None and any([ord(s) >= 128 for s in string])
    
    df[df[col].apply(is_not_ascii)]
    

    You'll need to specify the column col you are testing.

    0 讨论(0)
  • 2020-12-15 08:46

    Another solution is to use string functions encode/decode with the 'ignore' option, but it will remove non-ascii characters:

    df['text'] = df['text'].apply(lambda x: x.encode('ascii', 'ignore').decode('ascii'))

    0 讨论(0)
  • 2020-12-15 08:48

    Check the answer here

    It's a much simpler solution:

    newdf.to_csv('filename.csv', encoding='utf-8')
    
    0 讨论(0)
  • 2020-12-15 08:54

    Try this, it works

    newdf.to_csv('filename.csv', encoding='utf-8')

    0 讨论(0)
提交回复
热议问题