Why is lookahead (sometimes) faster than capturing?

后端 未结 2 1048
粉色の甜心
粉色の甜心 2020-12-19 07:38

This question is inspired by this other one.

Comparing s/,(\\d)/$1/ to s/,(?=\\d)//: the former uses a capture group to replace only the di

相关标签:
2条回答
  • 2020-12-19 08:22

    The two approaches do different things and have different kinds of overhead costs. When you capture, perl has to make a copy of the captured text. Look-ahead matches without consuming; it has to mark the location where it starts. You can see what's happening by using the re 'debug' pragma:

    use re 'debug';
    my $capture = qr/,(\d)/;
    
    Compiling REx ",(\d)"
    Final program:
       1: EXACT  (3)
       3: OPEN1 (5)
       5:   DIGIT (6)
       6: CLOSE1 (8)
       8: END (0)
    anchored "," at 0 (checking anchored) minlen 2 
    Freeing REx: ",(\d)"
    
    use re 'debug';
    my $lookahead = qr/,(?=\d)/;
    
    Compiling REx ",(?=\d)"
    Final program:
       1: EXACT  (3)
       3: IFMATCH[0] (8)
       5:   DIGIT (6)
       6:   SUCCEED (0)
       7: TAIL (8)
       8: END (0)
    anchored "," at 0 (checking anchored) minlen 1 
    Freeing REx: ",(?=\d)"
    

    I'd expect look-ahead to be faster than capturing in most cases, but as noted in the other thread regex performance can be data dependent.

    0 讨论(0)
  • 2020-12-19 08:26

    As always, when you want to know which of two pieces of code works faster, you have to test it:

    #!/usr/bin/perl
    
    use 5.012;
    use warnings;
    use Benchmark qw<cmpthese>;
    
    say "Extreme ,,,:";
    my $Text = ',' x (my $LEN = 512);
    cmpthese my $TIME = -10, my $CMP = {
        capture => \&capture,
        lookahead => \&lookahead,
    };
    
    say "\nExtreme ,0,0,0:";
    $Text = ',0' x $LEN;
    cmpthese $TIME, $CMP;
    
    my $P = 0.01;
    say "\nMixed (@{[$P * 100]}% zeros):";
    my $zeros = $LEN * $P;
    $Text = ',' x ($LEN - $zeros) . ',0' x $zeros;
    cmpthese $TIME, $CMP;
    
    sub capture {
        local $_ = $Text;
        s/,(\d)/$1/;
    }
    
    sub lookahead {
        local $_ = $Text;
        s/,(?=\d)//;
    }
    

    The benchmark tests three different cases:

    1. Only ','
    2. Only ',0'
    3. 1% ',0', rest ','

    On my machine and with my perl version, it produces these results:

    Extreme ,,,:
                 Rate   capture lookahead
    capture   23157/s        --       -1%
    lookahead 23362/s        1%        --
    
    Extreme ,0,0,0:
                   Rate   capture lookahead
    capture    419476/s        --      -65%
    lookahead 1200465/s      186%        --
    
    Mixed (1% zeros):
                 Rate   capture lookahead
    capture   22013/s        --       -4%
    lookahead 22919/s        4%        --
    

    These results substantiates the assumption that the look-ahead version is significantly faster than the capturing, except for the case of almost only commas. And it is indeed not very surprising as PSIAlt already explained in his comment.

    regards, Matthias

    0 讨论(0)
提交回复
热议问题