问题
sample.txt
contains
abcde
abde
Can anybody explain the output of following commands -
grep '[[ab]]' sample.txt
- no outputgrep '[ab[]]' sample.txt
- no outputgrep '[ab[]' sample.txt
- output isabcde
,abde
grep '[ab]]' sample.txt
- no output
And what does [(ab)]
and [^(ab)]
mean? Is it the same as [ab]
and [^ab]
?
回答1:
First thing to understand is, inside a character class, none of the meta-characters of regex has any special meaning. They are matched literally. For e.g., an *
will match a *
and will not mean 0 or 1
repetition. Similarly, ()
will match (
and )
, and will not create a capture group
.
Now, if a ]
is found in a character class, that automatically closes the character class, and the further character won't be the part of that character class. Now, let's understand what is happening above:
In 1
, 2
, and 4
, your character class ends at the first closing ]
. So, the last closing bracket - ]
, is not the part of character class. It has to be matched separately. So, your pattern will match something like this:
'[[ab]]' is same as '([|a|b)(])' // The last `]` has to match.
'[ab[]]' is same as '(a|b|[)(])' // Again, the last `]` has to match.
'[ab]]' is same as '(a|b|])(])' // Same, the last `]` has to match.
^
^---- Character class closes here.
Now, since in both the string, there is no ]
at the end, hence no match is found.
Whereas, in the 3rd pattern, your character class is closed only by the last ]
. And hence everything comes inside the character class.
'[ab[]' means match string that contains 'a', or 'b', or '['
which is perfectly valid and match both the string.
And what does
[(ab)]
and[^(ab)]
mean?
[(ab)]
means match any of the (
, a
, b
, )
. Remember, inside a character class, no meta-character of regex has any special meaning. So, you can't create groups inside a character class.
[^(ab)]
means exact opposite of [(ab)]
. It matches any string which does not contain any of those characters specified.
Is it the same as
[ab]
and[^ab]
?
No. These two does not include (
and )
. Hence they are little different.
回答2:
I give it a try:
grep '[[ab]]' - match string which has one of "[,a,b" and then a "]" char followed
grep '[ab[]]' - match string which has one of "a,b,[" and then a "]" char followed
grep '[ab[]' - match string which has one of "a,b,["
grep '[ab]]' - match string which has one of "a,b" and then a "]" char followed
grep '[(ab)]' - match string which has one of "(,a,b,)"
grep '[^(ab)]' - match string which doesn't contain "(,a,b" and ")"
grep '[ab]' - match string which contains one of "a,b"
grep '[^ab]' - match string which doesn't contain "a" and "b"
you can go through those grep
cmds on this example:
#create a file with below lines:
abcde
abde
[abcd
abcd]
abc[]foo
abc]bar
[ab]cdef
a(b)cde
you will see the difference, and think about it with my comment/explanation.
来源:https://stackoverflow.com/questions/14891871/grep-ambiguity-nested-square-bracket