Replace non-ASCII characters with a single space

后端 未结 7 1599
死守一世寂寞
死守一世寂寞 2020-11-22 16:17

I need to replace all non-ASCII (\\x00-\\x7F) characters with a space. I\'m surprised that this is not dead-easy in Python, unless I\'m missing something. The following func

7条回答
  •  暖寄归人
    2020-11-22 16:36

    Potentially for a different question, but I'm providing my version of @Alvero's answer (using unidecode). I want to do a "regular" strip on my strings, i.e. the beginning and end of my string for whitespace characters, and then replace only other whitespace characters with a "regular" space, i.e.

    "Ceñíaㅤmañanaㅤㅤㅤㅤ"
    

    to

    "Ceñía mañana"
    

    ,

    def safely_stripped(s: str):
        return ' '.join(
            stripped for stripped in
            (bit.strip() for bit in
             ''.join((c if unidecode(c) else ' ') for c in s).strip().split())
            if stripped)
    

    We first replace all non-unicode spaces with a regular space (and join it back again),

    ''.join((c if unidecode(c) else ' ') for c in s)
    

    And then we split that again, with python's normal split, and strip each "bit",

    (bit.strip() for bit in s.split())
    

    And lastly join those back again, but only if the string passes an if test,

    ' '.join(stripped for stripped in s if stripped)
    

    And with that, safely_stripped('ㅤㅤㅤㅤCeñíaㅤmañanaㅤㅤㅤㅤ') correctly returns 'Ceñía mañana'.

提交回复
热议问题