How can I store regex captures in an array in Perl?

后端 未结 7 1196
孤城傲影
孤城傲影 2020-12-01 03:38

Is it possible to store all matches for a regular expression into an array?

I know I can use ($1,...,$n) = m/expr/g;, but it seems as though that can on

相关标签:
7条回答
  • 2020-12-01 03:44

    Is it possible to store all matches for a regular expression into an array?

    Yes, in Perl 5.25.7, the variable @{^CAPTURE} was added, which holds "the contents of the capture buffers, if any, of the last successful pattern match". This means it contains ($1, $2, ...) even if the number of capture groups is unknown.

    Before Perl 5.25.7 (since 5.6.0) you could build the same array using @- and @+ as suggested by @Jaques in his answer. You would have to do something like this:

        my @capture = ();
        for (my $i = 1; $i < @+; $i++) {
            push @capture, substr $subject, $-[$i], $+[$i] - $-[$i];
        }
    
    0 讨论(0)
  • 2020-12-01 03:45

    If you're doing a global match (/g) then the regex in list context will return all of the captured matches. Simply do:

    my @matches = ( $str =~ /pa(tt)ern/g )
    

    This command for example:

    perl -le '@m = ( "foo12gfd2bgbg654" =~ /(\d+)/g ); print for @m'
    

    Gives the output:

    12
    2
    654
    
    0 讨论(0)
  • 2020-12-01 03:52

    I am surprised this is not already mentioned here, but perl documentation provides with the standard variable @+. To quote from the documentation:

    This array holds the offsets of the beginnings of the last successful submatches in the currently active dynamic scope.

    So, to get the value caught in first capture, one would write:

    print substr( $str, $-[1], $+[1] - $-[1] ), "\n"; # equivalent to $1
    

    As a side note, there is also the standard variable %- which is very nifty, because it not only contains named captures, but also allows for duplicate names to be stored in an array.

    Using the example provided in the documentation:

    /(?<A>1)(?<B>2)(?<A>3)(?<B>4)/
    

    would yield an hash with entries such as:

    $-{A}[0] : '1'
    $-{A}[1] : '3'
    $-{B}[0] : '2'
    $-{B}[1] : '4'
    
    0 讨论(0)
  • 2020-12-01 03:53

    See the manual entry for perldoc perlop under "Matching in List Context":

    If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1 , $2 , $3 ...)

    The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

    You can simply grab all the matches by assigning to an array, or otherwise performing the evaluation in list context:

    my @matches = ($string =~ m/word/g);
    
    0 讨论(0)
  • 2020-12-01 03:56

    Note that if you know the number of capturing groups you need per match, you can use this simple approach, which I present as an example (of 2 capturing groups.)

    Suppose you have some 'data' like

    my $mess = <<'IS_YOURS';
    Richard     Rich
    April           May
    Harmony             Ha\rm
    Winter           Win
    Faith     Hope
    William         Will
    Aurora     Dawn
    Joy  
    IS_YOURS
    

    With the following regex

    my $oven = qr'^(\w+)\h+(\w+)$'ma;  # skip the /a modifier if using perl < 5.14
    

    I can capture all 12 (6 pairs, not 8...Harmony escaped and Joy is missing) in the @box below.

    my @box = $mess =~ m[$oven]g;
    

    If I want to "hash out" the details of the box I could just do:

    my %hash = @box;
    

    Or I just could have just skipped the box entirely,

    my %hash = $mess =~ m[$oven]g;
    

    Note that %hash contains the following. Order is lost and dupe keys (if any had existed) are squashed:

    (
              'April'   => 'May',
              'Richard' => 'Rich',
              'Winter'  => 'Win',
              'William' => 'Will', 
              'Faith'   => 'Hope',
              'Aurora'  => 'Dawn'
    );
    
    0 讨论(0)
  • 2020-12-01 04:01

    I think this is a self-explanatory example. Note /g modifier in the first regex:

    $string = "one two three four";
    
    @res = $string =~ m/(\w+)/g;
    print Dumper(@res); # @res = ("one", "two", "three", "four")
    
    @res = $string =~ m/(\w+) (\w+)/;
    print Dumper(@res); # @res = ("one", "two")
    

    Remember, you need to make sure the lvalue is in the list context, which means you have to surround scalar values with parenthesis:

    ($one, $two) = $string =~ m/(\w+) (\w+)/;
    
    0 讨论(0)
提交回复
热议问题