In Python, how to check if a string only contains certain characters?

后端 未结 7 2088
悲&欢浪女
悲&欢浪女 2020-12-02 12:36

In Python, how to check if a string only contains certain characters?

I need to check a string containing only a..z, 0..9, and . (period) and no other character.

7条回答
  •  忘掉有多难
    2020-12-02 13:21

    EDIT: Changed the regular expression to exclude A-Z

    Regular expression solution is the fastest pure python solution so far

    reg=re.compile('^[a-z0-9\.]+$')
    >>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
    True
    >>> timeit.Timer("reg.match('jsdlfjdsf12324..3432jsdflsdf')", "import re; reg=re.compile('^[a-z0-9\.]+$')").timeit()
    0.70509696006774902
    

    Compared to other solutions:

    >>> timeit.Timer("set('jsdlfjdsf12324..3432jsdflsdf') <= allowed", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
    3.2119350433349609
    >>> timeit.Timer("all(c in allowed for c in 'jsdlfjdsf12324..3432jsdflsdf')", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
    6.7066690921783447
    

    If you want to allow empty strings then change it to:

    reg=re.compile('^[a-z0-9\.]*$')
    >>>reg.match('')
    False
    

    Under request I'm going to return the other part of the answer. But please note that the following accept A-Z range.

    You can use isalnum

    test_str.replace('.', '').isalnum()
    
    >>> 'test123.3'.replace('.', '').isalnum()
    True
    >>> 'test123-3'.replace('.', '').isalnum()
    False
    

    EDIT Using isalnum is much more efficient than the set solution

    >>> timeit.Timer("'jsdlfjdsf12324..3432jsdflsdf'.replace('.', '').isalnum()").timeit()
    0.63245487213134766
    

    EDIT2 John gave an example where the above doesn't work. I changed the solution to overcome this special case by using encode

    test_str.replace('.', '').encode('ascii', 'replace').isalnum()
    

    And it is still almost 3 times faster than the set solution

    timeit.Timer("u'ABC\u0131\u0661'.encode('ascii', 'replace').replace('.','').isalnum()", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
    1.5719811916351318
    

    In my opinion using regular expressions is the best to solve this problem

提交回复
热议问题