How can I write a regex that matches only letters?
In python, I have found the following to work:
[^\W\d_]
This works because we are creating a new character class (the []
) which excludes (^
) any character from the class \W
(everything NOT in [a-zA-Z0-9_]
), also excludes any digit (\d
) and also excludes the underscore (_
).
That is, we have taken the character class [a-zA-Z0-9_]
and removed the 0-9
and _
bits. You might ask, wouldn't it just be easier to write [a-zA-Z]
then, instead of [^\W\d_]
? You would be correct if dealing only with ASCII text, but when dealing with unicode text:
\W
Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].
^ from the python re module documentation
That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.
For example, the following code snippet
import re
regex = "[^\W\d_]"
test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
re.findall(regex, test_string)
Returns
['A', 'B', 's', 'f', 'a']