How to strip all whitespace from string

前端 未结 11 1908
暖寄归人
暖寄归人 2020-11-28 18:25

How do I strip all the spaces in a python string? For example, I want a string like strip my spaces to be turned into stripmyspaces, but I cannot s

11条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-11-28 19:01

    For Python 3:

    >>> import re
    >>> re.sub(r'\s+', '', 'strip my \n\t\r ASCII and \u00A0 \u2003 Unicode spaces')
    'stripmyASCIIandUnicodespaces'
    >>> # Or, depending on the situation:
    >>> re.sub(r'(\s|\u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF)+', '', \
    ... '\uFEFF\t\t\t strip all \u000A kinds of \u200B whitespace \n')
    'stripallkindsofwhitespace'
    

    ...handles any whitespace characters that you're not thinking of - and believe us, there are plenty.

    \s on its own always covers the ASCII whitespace:

    • (regular) space
    • tab
    • new line (\n)
    • carriage return (\r)
    • form feed
    • vertical tab

    Additionally:

    • for Python 2 with re.UNICODE enabled,
    • for Python 3 without any extra actions,

    ...\s also covers the Unicode whitespace characters, for example:

    • non-breaking space,
    • em space,
    • ideographic space,

    ...etc. See the full list here, under "Unicode characters with White_Space property".

    However \s DOES NOT cover characters not classified as whitespace, which are de facto whitespace, such as among others:

    • zero-width joiner,
    • Mongolian vowel separator,
    • zero-width non-breaking space (a.k.a. byte order mark),

    ...etc. See the full list here, under "Related Unicode characters without White_Space property".

    So these 6 characters are covered by the list in the second regex, \u180B|\u200B|\u200C|\u200D|\u2060|\uFEFF.

    Sources:

    • https://docs.python.org/2/library/re.html
    • https://docs.python.org/3/library/re.html
    • https://en.wikipedia.org/wiki/Unicode_character_property

提交回复
热议问题