Is there any way to preprocess text files and skip these characters?
UnicodeDecodeError: \'utf8\' codec can\'t decode byte 0xa1 in position 1395: invalid sta
Try this:
str.decode('utf-8',errors='ignore')
I think your text file have some special character, so 'utf-8' can't decode.
You need to try using 'ISO-8859-1' instead of 'utf-8'. like this:
import sys reload(sys).setdefaultencoding("ISO-8859-1") # put your code here