I need to replace all non-ASCII (\\x00-\\x7F) characters with a space. I\'m surprised that this is not dead-easy in Python, unless I\'m missing something. The following func
Potentially for a different question, but I'm providing my version of @Alvero's answer (using unidecode). I want to do a "regular" strip on my strings, i.e. the beginning and end of my string for whitespace characters, and then replace only other whitespace characters with a "regular" space, i.e.
"Ceñíaㅤmañanaㅤㅤㅤㅤ"
to
"Ceñía mañana"
,
def safely_stripped(s: str):
return ' '.join(
stripped for stripped in
(bit.strip() for bit in
''.join((c if unidecode(c) else ' ') for c in s).strip().split())
if stripped)
We first replace all non-unicode spaces with a regular space (and join it back again),
''.join((c if unidecode(c) else ' ') for c in s)
And then we split that again, with python's normal split, and strip each "bit",
(bit.strip() for bit in s.split())
And lastly join those back again, but only if the string passes an if test,
' '.join(stripped for stripped in s if stripped)
And with that, safely_stripped('ㅤㅤㅤㅤCeñíaㅤmañanaㅤㅤㅤㅤ') correctly returns 'Ceñía mañana'.