Capturing Quantifiers and Quantifier Arithmetic

后端 未结 2 1804

At the outset, let me explain that this question is neither about how to capture groups, nor about how to use quantifiers, two features of regex I am perfectly familiar with

2条回答
  •  既然无缘
    2020-11-30 04:53

    Coming back five weeks later because I learned that .NET has something that comes very close to the idea of "quantifier capture" mentioned in the question. The feature is called "balancing groups".

    Here is the solution I came up with. It looks long, but it is quite simple.

    (?:@(?)(?)(?))+[^@=]+(?<-c1>=)+[^=-]+(?<-c2>-)+[^-/]+(?<-c3>/)+[^/]+(?(c1)(?!))(?(c2)(?!))(?(c3)(?!))
    

    How does it work?

    1. The first non-capturing group matches the @ characters. In that non-capturing group, we have three named groups c1, c2 and c3 that don't match anything, or rather, that match an empty string. These groups will serve as three counters c1, c2 and c3. Because .NET keeps track of intermediate captures when a group is quantified, every time an @ is matched, a capture is added to the capture collections for Groups c1, c2 and c3.

    2. Next, [^@=]+ eats up all the characters up to the first =.

    3. The second quantified group (?<-c1>=)+ matches the = characters. That group seems to be named -c1, but -c1 is not a group name. -c1 is.NET syntax to pop one capture from the c1 group's capture collection into the ether. In other words, it allows us to decrement c1. If you try to decrement c1 when the capture collection is empty, the match fails. This ensures that we can never have more = than @ characters. (Later, we'll have to make sure that we cannot have more @ than = characters.)

    4. The next steps repeat steps 2 and 3 for the - and / characters, decrementing counters c2 and c3.

    5. The [^/]+ eats up the rest of the string.

    6. The (?(c1)(?!)) is a conditional that says "If group c1 has been set, then fail". You may know that (?!) is a common trick to force a regex to fail. This conditional ensures that c1 has been decremented all the way to zero: in other words, there cannot be more @ than = characters.

    7. Likewise, the (?(c2)(?!)) and (?(c3)(?!)) ensure that there cannot be more @ than - and / characters.

    I don't know about you, but even this is a bit long, I find it really intuitive.

提交回复
热议问题