Replace non-ASCII characters with a single space

后端 未结 7 1581
死守一世寂寞
死守一世寂寞 2020-11-22 16:17

I need to replace all non-ASCII (\\x00-\\x7F) characters with a space. I\'m surprised that this is not dead-easy in Python, unless I\'m missing something. The following func

7条回答
  •  萌比男神i
    2020-11-22 16:33

    Your ''.join() expression is filtering, removing anything non-ASCII; you could use a conditional expression instead:

    return ''.join([i if ord(i) < 128 else ' ' for i in text])
    

    This handles characters one by one and would still use one space per character replaced.

    Your regular expression should just replace consecutive non-ASCII characters with a space:

    re.sub(r'[^\x00-\x7F]+',' ', text)
    

    Note the + there.

提交回复
热议问题