retrieve patterns that exactly match all regex in Perl

白昼怎懂夜的黑 提交于 2021-02-08 14:22:36

问题


I have a database of subgraphs that looks like this:

t # 3-231, 1
v 0 94
v 1 14
v 2 16
v 3 17
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-232, 1
v 0 14
v 1 94
v 2 19
v 3 91
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-233, 1
v 0 17
v 1 91
v 2 16
v 3 94
u 0 1 2
u 0 3 2
u 1 2 2
t # 3-234, 1
v 0 90
v 1 93
v 2 102
v 3 95
u 0 1 2
u 0 3 2
u 1 2 2

I would like to retrieve all transactions that contains the following patterns: 'u 0 1 2' and 'u 0 2 2' along with transaction id (ex. line starts with t #).

I used the following code to accomplish this job:

#!/usr/bin/perl -w

use strict;

my $input = shift @ARGV or die $!; 

open (FILE, "$input") or die $!;

while (<FILE>) {

my @fields = ('t', 'u\ 0\ 1', 'u\ 0\ 2');  
my $field_regex = join( "|", @fields );
my @field_lines;

    push( @field_lines, $_ ) if ( /^(?:$field_regex) / );
    last if @field_lines == @fields;

push @field_lines, "";

print join( "\n", sort @field_lines );
}

close FILE;

However, it retrieves patterns, when only one line match, such as:

t # 3-231, 1
u 0 1 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-233, 1
u 0 1 2

My ultimate goal is to retrieve transactions that completely match to my regex, such as

t # 3-231, 1
u 0 1 2
u 0 2 2

Thank you for your help!

Olha


回答1:


One way: keep the current transaction-id on hand, and store lines of interest in an arrayref associated with that transaction-id key in a hash.

use warnings;
use strict;
use feature 'say';    
use Data::Dump qw(dd);

my @fields = ('u 0 1', 'u 0 2');  
my $field_regex = join '|', map { quotemeta } @fields;
    
my (%trans, $tid);

while (<>) {
    chomp;
    if (/^t #/) { 
        $tid = $_; 
        next;
    }   
  
    push @{$trans{$tid}}, $_  if /$field_regex/;
}

dd %trans;

# foreach my $tid (sort keys %trans) { 
#     say $tid;
#     say for @{$trans{$tid}};
# }

I use while (<>) which reads line by line all files given on command line when the program is invoked (or STDIN), for simplicity here. I use Data::Dump to show a complex data structure; there is Data::Dumper in the core for that.

The qotemeta escapes all ASCI non-"word" characters, that can throw off regex, and this includes spaces.

The program above in general loses the order of transaction-id's from the file, since hash keys are unordered, while it keeps the order of lines for each id since those are on an array. This is not hard to remedy if needed.

Tested only with the provided data file.




回答2:


This type of pattern is most easily handled by treating the file contents as blocks rather than lines.

Here is an easy example (with your data):

use strict;

my $big_string;
my $trans;
my $block;

open my $fh, '<', '/tmp/file.txt' or die "Can't open file $!";

$big_string = do { local $/; <$fh> };

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    if ($block=~/^(u\h+0\h+[12]\h+2)/m){
        print "$trans\n";
        for ($block=~/^(u\h+0\h+[12]\h+2)/mg) {
            print "$1\n";
        }
    }
}   

Prints:

t # 3-231, 1
u 0 2 2
u 0 2 2
t # 3-232, 1
u 0 1 2
t # 3-233, 1
u 0 1 2
t # 3-234, 1
u 0 1 2

This assumes that your data fits easily into memory. If not, there are many ways to read a file block by block as well.

If you only want the blocks that have more than one match to the second regex:

while ($big_string=~/^(t\h*#\h*[0-9,\h-]+[\s\S]*?(?=(?:^t\h*#\h*[0-9,\h-]+)|\z))/mg) {
    $block=$1;
    $trans=$1 if $block=~/^(t\h*#\h*[0-9,\h-]+)/;
    @matches=$block=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches >1) {
        print "$trans\n";
        print join "\n", @matches;
    }
}   

Prints:

t # 3-231, 1
u 0 1 2
u 0 2 2

And, of course, TIMTOWDI:

my @result = do{ local @_ = split(/^(t\h+#[\h\d,-]+)/m, $big_string); 
                        @_[1..$#_]};

for my $i (0 .. @result/2-1) {
    @matches=$result[2*$i+1]=~/^(u\h+0\h+[12]\h+2)/mg;
    if (scalar @matches>1){
        print "$result[2*$i]\n";
        print join("\n", @matches);
    }
}   
t # 3-231, 1
u 0 1 2
u 0 2 2



回答3:


perl -lne '@h=($_) if /^t #/; push @h,$_ if /^u 0 [12] 2/; if (@h==3) { print shift @h while @h }' file

reset & hold the transaction line; append the matching lines; print and reset if you accumulate 3 lines.



来源:https://stackoverflow.com/questions/63404753/retrieve-patterns-that-exactly-match-all-regex-in-perl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!