Regex Group in Perl: how to capture elements into array from regex group that matches unknown number of/multiple/variable occurrences from a string?

前端 未结 9 2150
遇见更好的自我
遇见更好的自我 2020-12-04 10:10

In Perl, how can I use one regex grouping to capture more than one occurrence that matches it, into several array elements?

For example, for a string:



        
9条回答
  •  庸人自扰
    2020-12-04 10:34

    With regular expressions, use a technique that I like to call tack-and-stretch: anchor on features you know will be there (tack) and then grab what's between (stretch).

    In this case, you know that a single assignment matches

    \b\w+=.+
    

    and you have many of these repeated in $string. Remember that \b means word boundary:

    A word boundary (\b) is a spot between two characters that has a \w on one side of it and a \W on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a \W.

    The values in the assignments can be a little tricky to describe with a regular expression, but you also know that each value will terminate with whitespace—although not necessarily the first whitespace encountered!—followed by either another assignment or end-of-string.

    To avoid repeating the assertion pattern, compile it once with qr// and reuse it in your pattern along with a look-ahead assertion (?=...) to stretch the match just far enough to capture the entire value while also preventing it from spilling into the next variable name.

    Matching against your pattern in list context with m//g gives the following behavior:

    The /g modifier specifies global pattern matching—that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

    The pattern $assignment uses non-greedy .+? to cut off the value as soon as the look-ahead sees another assignment or end-of-line. Remember that the match returns the substrings from all capturing subpatterns, so the look-ahead's alternation uses non-capturing (?:...). The qr//, in contrast, contains implicit capturing parentheses.

    #! /usr/bin/perl
    
    use warnings;
    use strict;
    
    my $string = <<'EOF';
    var1=100 var2=90 var5=hello var3="a, b, c" var7=test var3=hello
    EOF
    
    my $assignment = qr/\b\w+ = .+?/x;
    my @array = $string =~ /$assignment (?= \s+ (?: $ | $assignment))/gx;
    
    for ( my $i = 0; $i < scalar( @array ); $i++ )
    {
      print $i.": ".$array[$i]."\n";
    }
    

    Output:

    0: var1=100
    1: var2=90
    2: var5=hello
    3: var3="a, b, c"
    4: var7=test
    5: var3=hello

提交回复
热议问题