In Python, how to check if a string only contains certain characters?
I need to check a string containing only a..z, 0..9, and . (period) and no other character.
EDIT: Changed the regular expression to exclude A-Z
Regular expression solution is the fastest pure python solution so far
reg=re.compile('^[a-z0-9\.]+$')
>>>reg.match('jsdlfjdsf12324..3432jsdflsdf')
True
>>> timeit.Timer("reg.match('jsdlfjdsf12324..3432jsdflsdf')", "import re; reg=re.compile('^[a-z0-9\.]+$')").timeit()
0.70509696006774902
Compared to other solutions:
>>> timeit.Timer("set('jsdlfjdsf12324..3432jsdflsdf') <= allowed", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
3.2119350433349609
>>> timeit.Timer("all(c in allowed for c in 'jsdlfjdsf12324..3432jsdflsdf')", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
6.7066690921783447
If you want to allow empty strings then change it to:
reg=re.compile('^[a-z0-9\.]*$')
>>>reg.match('')
False
Under request I'm going to return the other part of the answer. But please note that the following accept A-Z range.
You can use isalnum
test_str.replace('.', '').isalnum()
>>> 'test123.3'.replace('.', '').isalnum()
True
>>> 'test123-3'.replace('.', '').isalnum()
False
EDIT Using isalnum is much more efficient than the set solution
>>> timeit.Timer("'jsdlfjdsf12324..3432jsdflsdf'.replace('.', '').isalnum()").timeit()
0.63245487213134766
EDIT2 John gave an example where the above doesn't work. I changed the solution to overcome this special case by using encode
test_str.replace('.', '').encode('ascii', 'replace').isalnum()
And it is still almost 3 times faster than the set solution
timeit.Timer("u'ABC\u0131\u0661'.encode('ascii', 'replace').replace('.','').isalnum()", "import string; allowed = set(string.ascii_lowercase + string.digits + '.')").timeit()
1.5719811916351318
In my opinion using regular expressions is the best to solve this problem