I am writing a python MapReduce word count program. Problem is that there are many non-alphabet chars strewn about in the data, I have found this post Stripping everything b
Try:
s = ''.join(filter(str.isalnum, s))
This will take every char from the string, keep only alphanumeric ones and build a string back from them.