Grep ambiguity nested square bracket

问题

sample.txt contains

abcde
abde

Can anybody explain the output of following commands -

grep '[[ab]]' sample.txt - no output
grep '[ab[]]' sample.txt - no output
grep '[ab[]' sample.txt - output is abcde , abde
grep '[ab]]' sample.txt - no output

And what does [(ab)] and [^(ab)] mean? Is it the same as [ab] and [^ab] ?

回答1:

First thing to understand is, inside a character class, none of the meta-characters of regex has any special meaning. They are matched literally. For e.g., an * will match a * and will not mean 0 or 1 repetition. Similarly, () will match ( and ), and will not create a capture group.

Now, if a ] is found in a character class, that automatically closes the character class, and the further character won't be the part of that character class. Now, let's understand what is happening above:

In 1, 2, and 4, your character class ends at the first closing ]. So, the last closing bracket - ], is not the part of character class. It has to be matched separately. So, your pattern will match something like this:

'[[ab]]' is same as '([|a|b)(])'  // The last `]` has to match.
'[ab[]]' is same as '(a|b|[)(])'  // Again, the last `]` has to match.
'[ab]]'  is same as '(a|b|])(])'  // Same, the last `]` has to match.
    ^
    ^---- Character class closes here.

Now, since in both the string, there is no ] at the end, hence no match is found.

Whereas, in the 3rd pattern, your character class is closed only by the last ]. And hence everything comes inside the character class.

'[ab[]' means match string that contains 'a', or 'b', or '['

which is perfectly valid and match both the string.

And what does [(ab)] and [^(ab)] mean?

[(ab)] means match any of the (, a, b, ). Remember, inside a character class, no meta-character of regex has any special meaning. So, you can't create groups inside a character class.

[^(ab)] means exact opposite of [(ab)]. It matches any string which does not contain any of those characters specified.

Is it the same as [ab] and [^ab] ?

No. These two does not include ( and ). Hence they are little different.

回答2:

I give it a try:

grep '[[ab]]' - match string  which has one of "[,a,b" and then a "]" char followed
grep '[ab[]]' - match string  which has one of "a,b,[" and then a "]" char followed
grep '[ab[]'  - match string  which has one of "a,b,["
grep '[ab]]'  - match string  which has one of "a,b" and then a "]" char followed
grep '[(ab)]' - match string  which has one of "(,a,b,)"
grep '[^(ab)]' - match string  which doesn't contain "(,a,b" and ")"
grep '[ab]'    - match string  which contains one of "a,b"
grep '[^ab]' - match string  which doesn't contain "a" and "b"

you can go through those grep cmds on this example:

#create a file with below lines:
abcde
abde
[abcd
abcd]
abc[]foo
abc]bar
[ab]cdef
a(b)cde

you will see the difference, and think about it with my comment/explanation.

来源：https://stackoverflow.com/questions/14891871/grep-ambiguity-nested-square-bracket

标签

regex

grep