I have a file with more than 40.000 lines (file1) and I want to extract the lines matching patterns in file2 (about 6000 lines). I use grep like this, but it is very slow: <
Just for fun, here's a Perl version:
#!/usr/bin/perl
use strict;
use warnings;
my %patterns;
my $srch;
# Open file and get patterns to search for
open(my $fh2,"<","file2")|| die "ERROR: Could not open file2";
while (<$fh2>)
{
chop;
$patterns{$_}=1;
}
# Now read data file
open(my $fh1,"<","file1")|| die "ERROR: Could not open file1";
while (<$fh1>)
{
(undef,$srch,undef)=split;
print $_ if defined $patterns{$srch};
}
Here are some timings, using a 60,000 line file1 and 6,000 line file2 per Ed's file creation method:
time awk 'NR==FNR{pats[$0]; next} $2 in pats' file2 file1 > out
real 0m0.202s
user 0m0.197s
sys 0m0.005s
time ./go.pl > out2
real 0m0.083s
user 0m0.079s
sys 0m0.004s