Python remove anything that is not a letter or number

后端 未结 7 1278
甜味超标
甜味超标 2020-12-24 01:40

I\'m having a little trouble with Python regular expressions.

What is a good way to remove all characters in a string that are not letters or numbers?

Thanks

相关标签:
7条回答
  • 2020-12-24 02:02

    Also you can try to use isalpha and isnumeric methods the following way:

    text = 'base, sample test;'
    getVals = lambda x: (c for c in text if c.isalpha() or c.isnumeric())
    map(lambda word: ' '.join(getVals(word)): text.split(' '))
    
    0 讨论(0)
  • 2020-12-24 02:03

    you can use predefined regex in python : \W corresponds to the set [^a-zA-Z0-9_]. Then,

    import re
    s = 'Hello dutrow 123'
    re.sub('\W', '', s)
    --> 'Hellodutrow123'
    
    0 讨论(0)
  • 2020-12-24 02:10

    '\W' is the same as [^A-Za-z0-9_] plus accented chars from your locale.

    >>> re.sub('\W', '', 'text 1, 2, 3...')
    'text123'
    

    Maybe you want to keep the spaces or have all the words (and numbers):

    >>> re.findall('\w+', 'my. text, --without-- (punctuation) 123')
    ['my', 'text', 'without', 'punctuation', '123']
    
    0 讨论(0)
  • 2020-12-24 02:14

    In the char set matching rule [...] you can specify ^ as first char to mean "not in"

    import re
    re.sub("[^0-9a-zA-Z]",        # Anything except 0..9, a..z and A..Z
           "",                    # replaced with nothing
           "this is a test!!")    # in this string
    
    --> 'thisisatest'
    
    0 讨论(0)
  • 2020-12-24 02:26

    There are other ways also you may consider e.g. simply loop thru string and skip unwanted chars e.g. assuming you want to delete all ascii chars which are not letter or digits

    >>> newstring = [c for c in "a!1#b$2c%3\t\nx" if c in string.letters + string.digits]
    >>> "".join(newstring)
    'a1b2c3x'
    

    or use string.translate to map one char to other or delete some chars e.g.

    >>> todelete = [ chr(i) for i in range(256) if chr(i) not in string.letters + string.digits ]
    >>> todelete = "".join(todelete)
    >>> "a!1#b$2c%3\t\nx".translate(None, todelete)
    'a1b2c3x'
    

    this way you need to calculate todelete list once or todelete can be hard-coded once and use it everywhere you need to convert string

    0 讨论(0)
  • 2020-12-24 02:26

    You need to be more specific:

    1. What about Unicode "letters"? ie, those with diacriticals.
    2. What about white space? (I assume this is what you DO want to delete along with punctuation)
    3. When you say "letters" do you mean A-Z and a-z in ASCII only?
    4. When you say "numbers" do you mean 0-9 only? What about decimals, separators and exponents?

    It gets complex quickly...

    A great place to start is an interactive regex site, such as RegExr

    You can also get Python specific Python Regex Tool

    0 讨论(0)
提交回复
热议问题