Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

前端 未结 3 770
故里飘歌
故里飘歌 2020-12-20 13:50

I\'m matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.

After reading through https://docs.perl6.org/language/regex

3条回答
  •  佛祖请我去吃肉
    2020-12-20 13:50

    Perl 6 regexes scale up to full grammars, which produce parse trees. Those parse trees are a tree of Match objects. Each capture - named or positional - is either a Match object or, if quantified, an array of Match objects.

    This is in general good, but does involve making the trade-off you have observed: once you are on the inside of a nested capturing element, then you are populating a new Match object, with its own set of positional and named captures. For example, if we do:

    say "abab" ~~ /((a)(b))+/
    

    Then the result is:

    「abab」
     0 => 「ab」
      0 => 「a」
      1 => 「b」
     0 => 「ab」
      0 => 「a」
      1 => 「b」
    

    And we can then index:

    say $0;        # The array of the top-level capture, which was quantified
    say $0[1];     # The second Match
    say $0[1][0];  # The first Match within that Match object (the (a))
    

    It is a departure from regex tradition, but also an important part of scaling up to larger parsing challenges.

提交回复
热议问题