Purpose of [^\x20-\x7E] in regular expressions

前端 未结 4 1743
情深已故
情深已故 2020-12-13 18:04
 [^\\x20-\\x7E]

I saw this pattern used for a regular expression in which the goal was to remove non-ascii characters from a string. What does it

4条回答
  •  借酒劲吻你
    2020-12-13 18:15

    It means match any characters that are not printing characters.

    Printing characters include a to z, A to Z, 0 to 9 and symbols such as ",;$#% etc.

    ^ not
    \x20 hex code for space character
    - to 
    \x7e hex code for ~ (tilde) character
    

    All the ascii printing characters fall between these two.

    This statement matches non ascii characters as well as ascii control (non printing) characters such as bell, tab, null and others.

    Look at

    man ascii
    

    on a unix system to see which characters it matches.

    In perl, you could also write this as

    [^ -~]
    

    or

    [[:^cntrl:]]
    

    This last one is slightly different, in that it matches any non control character, including extended ascii (e.g. accented characters) and unicode.

    You may not want to restrict yourself to just ascii, since non US locations often use valid printing characters outside this small range, e.g. øüéåç...

提交回复
热议问题