Is [\s\S] same as . (dot)?

爱⌒轻易说出口 提交于 2020-12-13 09:36:06

问题


When we include shorthand for character class and negated-character class in same character class, is it same as dot . which mean any character ?

I did a test on regex101.com and every character matched.

Is [\s\S] [\w\W] and [\d\D] same as . ?

I want to know if this behavior is persistent in web's front and backend languages like Javascript, Php, Python and others.


回答1:


"No" it is not the same. It has an important difference if you are not using the single line flag (meaning that . does not match all).

The [\s\S] comes handy when you want to do mix of matches when the . does not match all.

It is easier to explain it with an example. Suppose you want to capture whatever is between a and b, so you can use pattern a(.*?)b (? is for ungreedy matches and parentheses for capturing the content), but if there are new lines suppose you don't want to capture this in the same group, so you can have another regex like a([\s\S]*?)b.

Therefore if we create one pattern using both approaches it results in:

a(.*)b|a([\s\S]*?)b

In this case, if you see the scenario in regex101, then you will have a colorful and easy way to differentiate the scenarios (in green capturing group #1 and in red capturing group #2):

So, in conclusion, the [\s\S] is a regex trick when you want to match multiple lines and the . does not suit your needs. It basically depends on your use case.

However, if you use the single line flag where . matches new lines, then you don't need the regex trick, below you can see that all is green and group 2 (red above) is not matched:

Have also created a javascript performance test and it impacts in the performance around 25%:

https://jsperf.com/ss-vs-dot




回答2:


The answer is: It depends.
If your regex engine does match every character with . then yes, the result is the same. If it doesn't then no, the result is not the same. In standard JavaScript . , for example, does not match line breaks.




回答3:


The "." does not match the newline character. And it does not match them even in Perl multiline matches. So, with a little Perl script like

#!/usr/bin/perl -w
use strict;
$/="---";
my $i=0;
my $patA='a[\d\D]b';
my $patB='a.b';
while(<>){
    $i++;
    print "$i: $_";
    print "    patA matches\n" if $_ =~ /$patA/;
    print "    patB matches\n" if $_ =~ /$patB/;
}

you can pipe some input to test to it like

$ cat |./aboveskript.pl
a
b

Please leave with CTRL-D, for multiple records separate them with three dashes. The output of the above is

1: a
b
    patA matches

So the pattern /a.b/ fails.



来源:https://stackoverflow.com/questions/44246215/is-s-s-same-as-dot

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!