I get some data from a webpage and read it like this in python
origional_doc = urllib2.urlopen(url).read()
Sometimes this url has characters su
using re you can sub all characters that are in a certain hexadecimal ascii range.
re
>>> re.sub('[\x80-\xFF]','','é and ä and ect') ' and and ect'
You can also do the inverse and sub anything thats NOT in the basic 128 characters:
>>> re.sub('[^\x00-\x7F]','','é and ä and ect') ' and and ect'