I know how to do this if I iterate through all of the characters in the string but I am looking for a more elegant method.
There are a variety of ways of achieving this goal, some are clearer than others. For each of my examples, 'True' means that the string passed is valid, 'False' means it contains invalid characters.
First of all, there's the naive approach:
import string
allowed = string.letters + string.digits + '_' + '-'
def check_naive(mystring):
return all(c in allowed for c in mystring)
Then there's use of a regular expression, you can do this with re.match(). Note that '-' has to be at the end of the [] otherwise it will be used as a 'range' delimiter. Also note the $ which means 'end of string'. Other answers noted in this question use a special character class, '\w', I always prefer using an explicit character class range using [] because it is easier to understand without having to look up a quick reference guide, and easier to special-case.
import re
CHECK_RE = re.compile('[a-zA-Z0-9_-]+$')
def check_re(mystring):
return CHECK_RE.match(mystring)
Another solution noted that you can do an inverse match with regular expressions, I've included that here now. Note that [^...] inverts the character class because the ^ is used:
CHECK_INV_RE = re.compile('[^a-zA-Z0-9_-]')
def check_inv_re(mystring):
return not CHECK_INV_RE.search(mystring)
You can also do something tricky with the 'set' object. Have a look at this example, which removes from the original string all the characters that are allowed, leaving us with a set containing either a) nothing, or b) the offending characters from the string:
def check_set(mystring):
return not set(mystring) - set(allowed)