Regex to match only letters

后端 未结 20 1749
孤城傲影
孤城傲影 2020-11-22 16:24

How can I write a regex that matches only letters?

20条回答
  •  萌比男神i
    2020-11-22 16:43

    In python, I have found the following to work:

    [^\W\d_]
    

    This works because we are creating a new character class (the []) which excludes (^) any character from the class \W (everything NOT in [a-zA-Z0-9_]), also excludes any digit (\d) and also excludes the underscore (_).

    That is, we have taken the character class [a-zA-Z0-9_] and removed the 0-9 and _ bits. You might ask, wouldn't it just be easier to write [a-zA-Z] then, instead of [^\W\d_]? You would be correct if dealing only with ASCII text, but when dealing with unicode text:

    \W

    Matches any character which is not a word character. This is the opposite of \w. > If the ASCII flag is used this becomes the equivalent of [^a-zA-Z0-9_].

    ^ from the python re module documentation

    That is, we are taking everything considered to be a word character in unicode, removing everything considered to be a digit character in unicode, and also removing the underscore.

    For example, the following code snippet

    import re
    regex = "[^\W\d_]"
    test_string = "A;,./>>?()*)&^*&^%&^#Bsfa1 203974"
    re.findall(regex, test_string)
    

    Returns

    ['A', 'B', 's', 'f', 'a']
    

提交回复
热议问题