How to strip all whitespace from string

前端未结

关注

 11  1908

暖寄归人 2020-11-28 18:25

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot s

11条回答

陌清茗 (楼主)

2020-11-28 19:01
For Python 3:
```
>>> import re
>>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
'stripmyASCIIandUnicodespaces'
>>> # Or, depending on the situation:
>>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
'stripallkindsofwhitespace'
```
...handles any whitespace characters that you're not thinking of - and believe us, there are plenty.

\s on its own always covers the ASCII whitespace:
- (regular) space
- tab
- new line (\n)
- carriage return (\r)
- form feed
- vertical tab
Additionally:
- for Python 2 with re.UNICODE enabled,
- for Python 3 without any extra actions,
...\s also covers the Unicode whitespace characters, for example:
- non-breaking space,
- em space,
- ideographic space,
...etc. See the full list here, under "Unicode characters with White_Space property".

However \s DOES NOT cover characters not classified as whitespace, which are de facto whitespace, such as among others:
- zero-width joiner,
- Mongolian vowel separator,
- zero-width non-breaking space (a.k.a. byte order mark),
...etc. See the full list here, under "Related Unicode characters without White_Space property".

So these 6 characters are covered by the list in the second regex, \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF.

Sources:
- https://docs.python.org/2/library/re.html
- https://docs.python.org/3/library/re.html
- https://en.wikipedia.org/wiki/Unicode_character_property
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...