Remove accented characters form string - Python

后端 未结 2 633
小蘑菇
小蘑菇 2021-01-28 10:09

I get some data from a webpage and read it like this in python

origional_doc = urllib2.urlopen(url).read()

Sometimes this url has characters su

2条回答
  •  既然无缘
    2021-01-28 10:41

    using re you can sub all characters that are in a certain hexadecimal ascii range.

    >>> re.sub('[\x80-\xFF]','','é and ä and ect')
    ' and  and ect'
    

    You can also do the inverse and sub anything thats NOT in the basic 128 characters:

    >>> re.sub('[^\x00-\x7F]','','é and ä and ect')
    ' and  and ect'
    

提交回复
热议问题