Why is an underscore (_) not regarded as a non-word character?

大憨熊 提交于 2021-01-21 07:23:32

问题


Why is an underscore (_) not regarded as a non-word character? This regexp \W matches all non-word character but not the underscore.


回答1:


Referring to Jeffrey Friedl's book about Regular Expressions, this was a change in Perl Regular Expressions, originally. Back to 1988 according to characters that were allowed to name a Perl variable [Page 89]:

Perl 2 was released in June 1988. Larry had replaced the regex code entirely, this time using a greatly enhanced version of the Henry Spencer package mentioned in the previous section. You could still have at most nine sets of parentheses, but now you could use | inside them. Support for \d and \s was added, and support for \w was changed to include an underscore, since then it would match what characters were allowed in a Perl variable name.




回答2:


\W is defined as [^A-Za-z0-9_].

It is the opposite of \w which is [A-Za-z0-9_] and means "a word character".

It is not about words as you perceive them in a spoken language. The "word" here means an identifier. Most programming languages allow (uppercase and lowercase) letters, digit and underscores (_) in identifiers.




回答3:


According to regex101: \W matches any non-word character (equal to [^a-zA-Z0-9_]). This seems to be a designers' choice.




回答4:


"Word character" definition is based on characters that can be used as a part of identifier in many programming languages, that is [A-Za-z0-9_].



来源:https://stackoverflow.com/questions/49533901/why-is-an-underscore-not-regarded-as-a-non-word-character

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!