(\\d+|)
vs (\\d+)?
[\\w\\W]
vs [\\d\\D]
vs .
Is there any difference between t
The second one is quite interesting, and I would like to say something about it:
[\w\W]
and [\d\D]
are equivalent, and they are equivalent to [\s\S]
also. \W
is the complement character set for \w
, and the same applies for \D
- \d
pair, and \S
- \s
pair. Therefore, when putting together, they will match any character without exception.
They are usually used when there is no construct to "match any character, without exception". JavaScript is one example of such case. There is also a less known and highly confusing construct to do so in JavaScript [^]
, which is usually invalid in other flavors.
Dot .
generally matches any character, but new line \n
. Depending on language, it may exclude more characters.
For Java, it excludes \n
, \r
, \u0085
, \u2028
, and \u2029
. So a .
is equivalent to [^\n\r\u0085\u2028\u2029]
For JavaScript, dot .
will exclude \r
, \u2028
, and \u2029
in addition to \n
. So .
is equivalent to [^\n\r\u2028\u2029]
Some language will have a mode to make .
matches any character, without exception. It is called DOTALL
mode in Java and Python, SingleLine
mode in C# and Perl.
The behavior of .
varies from language to language. Generally, they all agree that \n
should be excluded in "normal" mode, but they may differ slightly in choosing to exclude more.