I\'m trying to read and write a dataframe to a pipe-delimited file. Some of the characters are non-Roman letters (`, ç, ñ, etc.). But it breaks when I try to write out the a
You have some characters that are not ASCII and therefore cannot be encoded as you are trying to do. I would just use utf-8
as suggested in a comment.
To check which lines are causing the issue you can try something like this:
def is_not_ascii(string):
return string is not None and any([ord(s) >= 128 for s in string])
df[df[col].apply(is_not_ascii)]
You'll need to specify the column col
you are testing.
Another solution is to use string functions encode/decode with the 'ignore' option, but it will remove non-ascii characters:
df['text'] = df['text'].apply(lambda x: x.encode('ascii', 'ignore').decode('ascii'))
Check the answer here
It's a much simpler solution:
newdf.to_csv('filename.csv', encoding='utf-8')
Try this, it works
newdf.to_csv('filename.csv', encoding='utf-8')