In Python, how to check if a string only contains certain characters?

后端 未结 7 2081
悲&欢浪女
悲&欢浪女 2020-12-02 12:36

In Python, how to check if a string only contains certain characters?

I need to check a string containing only a..z, 0..9, and . (period) and no other character.

7条回答
  •  鱼传尺愫
    2020-12-02 13:35

    Final(?) edit

    Answer, wrapped up in a function, with annotated interactive session:

    >>> import re
    >>> def special_match(strg, search=re.compile(r'[^a-z0-9.]').search):
    ...     return not bool(search(strg))
    ...
    >>> special_match("")
    True
    >>> special_match("az09.")
    True
    >>> special_match("az09.\n")
    False
    # The above test case is to catch out any attempt to use re.match()
    # with a `$` instead of `\Z` -- see point (6) below.
    >>> special_match("az09.#")
    False
    >>> special_match("az09.X")
    False
    >>>
    

    Note: There is a comparison with using re.match() further down in this answer. Further timings show that match() would win with much longer strings; match() seems to have a much larger overhead than search() when the final answer is True; this is puzzling (perhaps it's the cost of returning a MatchObject instead of None) and may warrant further rummaging.

    ==== Earlier text ====
    

    The [previously] accepted answer could use a few improvements:

    (1) Presentation gives the appearance of being the result of an interactive Python session:

    reg=re.compile('^[a-z0-9\.]+$')
    >>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
    True
    

    but match() doesn't return True

    (2) For use with match(), the ^ at the start of the pattern is redundant, and appears to be slightly slower than the same pattern without the ^

    (3) Should foster the use of raw string automatically unthinkingly for any re pattern

    (4) The backslash in front of the dot/period is redundant

    (5) Slower than the OP's code!

    prompt>rem OP's version -- NOTE: OP used raw string!
    
    prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
    re;reg=re.compile(r'[^a-z0-9\.]')" "not bool(reg.search(t))"
    1000000 loops, best of 3: 1.43 usec per loop
    
    prompt>rem OP's version w/o backslash
    
    prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
    re;reg=re.compile(r'[^a-z0-9.]')" "not bool(reg.search(t))"
    1000000 loops, best of 3: 1.44 usec per loop
    
    prompt>rem cleaned-up version of accepted answer
    
    prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
    re;reg=re.compile(r'[a-z0-9.]+\Z')" "bool(reg.match(t))"
    100000 loops, best of 3: 2.07 usec per loop
    
    prompt>rem accepted answer
    
    prompt>\python26\python -mtimeit -s"t='jsdlfjdsf12324..3432jsdflsdf';import
    re;reg=re.compile('^[a-z0-9\.]+$')" "bool(reg.match(t))"
    100000 loops, best of 3: 2.08 usec per loop
    

    (6) Can produce the wrong answer!!

    >>> import re
    >>> bool(re.compile('^[a-z0-9\.]+$').match('1234\n'))
    True # uh-oh
    >>> bool(re.compile('^[a-z0-9\.]+\Z').match('1234\n'))
    False
    

提交回复
热议问题