问题
Part of me is worries that this question will get closed, but I'm genuinely baffled by something. In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero. I thought of design decisions that would lead to 1-indexing, which is usually to lower the barrier to entry for non-technical people, however when it comes to regex, which is already hellish and incomprehensible, this argument doesn't really seem to hold.
Additionally, since each language seems to have its own small tweaks on regex, it seems like it would be sensible to have capturing group indexing be consistent with the rest of the language.
Is there some other explanation? The idea has popped into my head that the 1-indexing is a result of something deeper within the belly of regex (like something inherently taking up the zero spot) or something along those lines. That said, I wasn't able to find any documentation on this particular quirk. Are there any regex masters out there that are aware of something deeper going on here, or is it just something in seriously legacy code?
回答1:
In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero.
I guess, by rest of the language you mean, arrays and other container types. Well, in regex, capture groups do start with 0, but it is not obvious at first.
The capture group 0, contains the complete match, and the capture groups thereon, are the groups that you can see as created using parenthesis - ().
So, in the below regex, for string - "ab123cd":
ab(\d+)cd
There are really two groups:
- Group 0 - Is complete match -
ab123cd - Group 1 - Is the group you captured using
()-123
There on, the groups are numbered in the order of occurrence of opening parenthesis (.
So, for the below regex (Whitespaces added to readability):
ab( x (\d+))cd
^ ^
| |
group 1 group 2
When applying the above regex to string - "abx123cd", you will have following groups:
- Group 0 - Complete match -
abcx123cd - Group 1 - Pattern in first opening parenthesis -
x123 - Group 2 - Pattern in 2nd opening parenthesis -
123
When you map those regex in Java, you can get all those groups using the following methods:
- Matcher.group() to get group 0 (Note, there are no parameters), and
- Matcher.group(int) to get rest of the groups (Note an
intparameter, taking value for respective group)
来源:https://stackoverflow.com/questions/17791639/why-are-regex-capturing-groups-indexed-at-one