How to rermove non-alphanumeric characters at the beginning or end of a string

梦想的初衷 提交于 2019-12-18 09:02:16

问题


I have a list with elements that have unnecessary (non-alphanumeric) characters at the beginning or end of each string.

Ex.

'cats--'

I want to get rid of the --

I tried:

for i in thelist:
    newlist.append(i.strip('\W'))

That didn't work. Any suggestions.


回答1:


def strip_nonalnum(word):
    if not word:
        return word  # nothing to strip
    for start, c in enumerate(word):
        if c.isalnum():
            break
    for end, c in enumerate(word[::-1]):
        if c.isalnum():
            break
    return word[start:len(word) - end]

print([strip_nonalnum(s) for s in thelist])

Or

import re

def strip_nonalnum_re(word):
    return re.sub(r"^\W+|\W+$", "", word)



回答2:


I believe that this is the shortest non-regex solution:

text = "`23`12foo--=+"

while len(word) > 0 and not text[0].isalnum():
    text = text[1:]
while len(word) > 0 and not text[-1].isalnum():
    text = text[:-1]

print text



回答3:


You can use a regex expression. The method re.sub() will take three parameters:

  • The regex expression
  • The replacement
  • The string

Code:

import re

s = 'cats--'
output = re.sub("[^\\w]", "", s)

print output

Explanation:

  • The part "\\w" matches any alphanumeric character.
  • [^x] will match any character that is not x



回答4:


To remove one or more chars other than letters, digits and _ from both ends you may use

re.sub(r'^\W+|\W+$', '', '??cats--') # => cats

Or, if _ is to be removed, too, wrap \W into a character class and add _ there:

re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_')

See the regex demo and the regex graph:

See the Python demo:

import re
print( re.sub(r'^\W+|\W+$', '', '??cats--') )          # => cats
print( re.sub(r'^[\W_]+|[\W_]+$', '', '_??cats--_') )  # => cats



回答5:


By using strip you have to know the substring to be stripped.

>>> 'cats--'.strip('-')
'cats'

You could use re to get rid of the non-alphanumeric characters but you would shoot with a cannon on a mouse IMO. With str.isalpha() you can test any strings to contain alphabetic characters, so you only need to keep those:

>>> ''.join(char for char in '#!cats-%' if char.isalpha())
'cats'
>>> thelist = ['cats5--', '#!cats-%', '--the#!cats-%', '--5cats-%', '--5!cats-%']
>>> [''.join(c for c in e if c.isalpha()) for e in thelist]
['cats', 'cats', 'thecats', 'cats', 'cats']

You want to get rid of non-alphanumeric so we can make this better:

>>> [''.join(c for c in e if c.isalnum()) for e in thelist]
['cats5', 'cats', 'thecats', '5cats', '5cats']

This one is exactly the same result you would get with re (as of Christian's answer):

>>> import re
>>> [re.sub("[^\\w]", "", e) for e in thelist]
['cats5', 'cats', 'thecats', '5cats', '5cats']

However, If you want to strip non-alphanumeric characters from the end of the strings only you should use another pattern like this one (check re Documentation):

>>> [''.join(re.search('^\W*(.+)(?!\W*$)(.)', e).groups()) for e in thelist]
['cats5', 'cats', 'the#!cats', '5cats', '5!cats']


来源:https://stackoverflow.com/questions/22650506/how-to-rermove-non-alphanumeric-characters-at-the-beginning-or-end-of-a-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!