I have the following string:
word = u\'Buffalo,\\xa0IL\\xa060625\'
I don\'t want the \"\\xa0\" in there. How can I get rid of it? The st
The most robust way would be to use the unidecode module to convert all non-ASCII characters to their closest ASCII equivalent automatically.
The character \xa0
(not \xa
as you stated) is a NO-BREAK SPACE, and the closest ASCII equivalent would of course be a regular space.
import unidecode
word = unidecode.unidecode(word)
There is no \xa
there. If you try to put that into a string literal, you're going to get a syntax error if you're lucky, or it's going to swallow up the next attempted character if you're not, because \x
sequences aways have to be followed by two hexadecimal digits.
What you have is \xa0
, which is an escape sequence for the character U+00A0, aka "NO-BREAK SPACE".
I think you want to replace them with spaces, but whatever you want to do is pretty easy to write:
word.replace(u'\xa0', u' ') # replaced with space
word.replace(u'\xa0', u'0') # closest to what you were literally asking for
word.replace(u'\xa0', u'') # removed completely
If you know for sure that is the only character you don't want, you can .replace
it:
>>> word.replace(u'\xa0', ' ')
u'Buffalo, IL 60625'
If you need to handle all non-ascii characters, encoding and replacing bad characters might be a good start...:
>>> word.encode('ascii', 'replace')
'Buffalo,?IL?60625'
You can easily use unicodedata
to get rid of all of \x...
characters.
from unicodedata import normalize
normalize('NFKD', word)
>>> 'Buffalo, IL 60625'
This seems to work for getting rid of non-ascii characters:
fixedword = word.encode('ascii','ignore')