How to rermove non-alphanumeric characters at the beginning or end of a string

后端 未结 5 1386
故里飘歌
故里飘歌 2020-12-07 03:27

I have a list with elements that have unnecessary (non-alphanumeric) characters at the beginning or end of each string.

Ex.

\'cats--\'
相关标签:
5条回答
  • 2020-12-07 03:32
    def strip_nonalnum(word):
        if not word:
            return word  # nothing to strip
        for start, c in enumerate(word):
            if c.isalnum():
                break
        for end, c in enumerate(word[::-1]):
            if c.isalnum():
                break
        return word[start:len(word) - end]
    
    print([strip_nonalnum(s) for s in thelist])
    

    Or

    import re
    
    def strip_nonalnum_re(word):
        return re.sub(r"^\W+|\W+$", "", word)
    
    0 讨论(0)
  • 2020-12-07 03:32

    I believe that this is the shortest non-regex solution:

    text = "`23`12foo--=+"
    
    while len(text) > 0 and not text[0].isalnum():
        text = text[1:]
    while len(text) > 0 and not text[-1].isalnum():
        text = text[:-1]
    
    print text
    
    0 讨论(0)
  • 2020-12-07 03:35

    You can use a regex expression. The method re.sub() will take three parameters:

    • The regex expression
    • The replacement
    • The string

    Code:

    import re
    
    s = 'cats--'
    output = re.sub("[^\\w]", "", s)
    
    print output
    

    Explanation:

    • The part "\\w" matches any alphanumeric character.
    • [^x] will match any character that is not x
    0 讨论(0)
  • 2020-12-07 03:39

    By using strip you have to know the substring to be stripped.

    >>> 'cats--'.strip('-')
    'cats'
    

    You could use re to get rid of the non-alphanumeric characters but you would shoot with a cannon on a mouse IMO. With str.isalpha() you can test any strings to contain alphabetic characters, so you only need to keep those:

    >>> ''.join(char for char in '#!cats-%' if char.isalpha())
    'cats'
    >>> thelist = ['cats5--', '#!cats-%', '--the#!cats-%', '--5cats-%', '--5!cats-%']
    >>> [''.join(c for c in e if c.isalpha()) for e in thelist]
    ['cats', 'cats', 'thecats', 'cats', 'cats']
    

    You want to get rid of non-alphanumeric so we can make this better:

    >>> [''.join(c for c in e if c.isalnum()) for e in thelist]
    ['cats5', 'cats', 'thecats', '5cats', '5cats']
    

    This one is exactly the same result you would get with re (as of Christian's answer):

    >>> import re
    >>> [re.sub("[^\\w]", "", e) for e in thelist]
    ['cats5', 'cats', 'thecats', '5cats', '5cats']
    

    However, If you want to strip non-alphanumeric characters from the end of the strings only you should use another pattern like this one (check re Documentation):

    >>> [''.join(re.search('^\W*(.+)(?!\W*$)(.)', e).groups()) for e in thelist]
    ['cats5', 'cats', 'the#!cats', '5cats', '5!cats']
    
    0 讨论(0)
  • 2020-12-07 03:51

    To remove one or more chars other than letters, digits and _ from both ends you may use

    re.sub(r'^\W+|\W+$', '', '??cats--') # => cats
    

    Or, if _ is to be removed, too, wrap \W into a character class and add _ there:

    re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_')
    

    See the regex demo and the regex graph:

    See the Python demo:

    import re
    print( re.sub(r'^\W+|\W+$', '', '??cats--') )          # => cats
    print( re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_') )  # => cats
    
    0 讨论(0)
提交回复
热议问题