Why are regex capturing groups indexed at one?

巧了我就是萌 提交于 2019-12-20 03:59:27

问题


Part of me is worries that this question will get closed, but I'm genuinely baffled by something. In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero. I thought of design decisions that would lead to 1-indexing, which is usually to lower the barrier to entry for non-technical people, however when it comes to regex, which is already hellish and incomprehensible, this argument doesn't really seem to hold.

Additionally, since each language seems to have its own small tweaks on regex, it seems like it would be sensible to have capturing group indexing be consistent with the rest of the language.

Is there some other explanation? The idea has popped into my head that the 1-indexing is a result of something deeper within the belly of regex (like something inherently taking up the zero spot) or something along those lines. That said, I wasn't able to find any documentation on this particular quirk. Are there any regex masters out there that are aware of something deeper going on here, or is it just something in seriously legacy code?


回答1:


In every language's regex that I've used, the capturing groups are indexed at one, even when the rest of the language is indexed at zero.

I guess, by rest of the language you mean, arrays and other container types. Well, in regex, capture groups do start with 0, but it is not obvious at first.

The capture group 0, contains the complete match, and the capture groups thereon, are the groups that you can see as created using parenthesis - ().

So, in the below regex, for string - "ab123cd":

ab(\d+)cd

There are really two groups:

  • Group 0 - Is complete match - ab123cd
  • Group 1 - Is the group you captured using () - 123

There on, the groups are numbered in the order of occurrence of opening parenthesis (.

So, for the below regex (Whitespaces added to readability):

ab(    x   (\d+))cd
  ^        ^
  |        |
 group 1  group 2

When applying the above regex to string - "abx123cd", you will have following groups:

  • Group 0 - Complete match - abcx123cd
  • Group 1 - Pattern in first opening parenthesis - x123
  • Group 2 - Pattern in 2nd opening parenthesis - 123

When you map those regex in Java, you can get all those groups using the following methods:

  • Matcher.group() to get group 0 (Note, there are no parameters), and
  • Matcher.group(int) to get rest of the groups (Note an int parameter, taking value for respective group)


来源:https://stackoverflow.com/questions/17791639/why-are-regex-capturing-groups-indexed-at-one

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!