Compare file lines for match anywhere in second file

心已入冬 提交于 2019-12-13 20:10:43

问题


This is frustrating. I have 2 text file that are just a phone number per line. I need to read the first line from file1, and search file2 for a match. If there is a no match, write the line value to an output file. I've been trying this but I know its wrong.

$file1 = 'pokus1.txt';
$file2 = 'pokus2.txt';

open (F1, $file1) || die ("Could not open $file1!");
open (F2, $file2) || die ("Could not open $file2!");
open (OUTFILE, '>>output\output_x1.txt');
@f1data = <F1>;
@f2data = <F2>;

while (@f1data){
    @grp = grep {/$f1data/} @f2data;

    print OUTFILE "$grp";
}
close (F1);
close (F2);
close (OUTFILE);

I hope someone can help? Thanks Brent


回答1:


bash :

not exists

grep -vf file1 file2 > file3

shared

grep -f file1 file2 > file4




回答2:


A customary solution where you process one file saving its data as keys of a hash and later process the other looking if that key exists:

#!/usr/bin/env perl

use warnings;
use strict;

my (%phone);

open my $fh1, '<', shift or die;
open my $fh2, '<', shift or die;
##open my $ofh, '>>', shift or die;

while ( <$fh2> ) { 
    chomp;
    $phone{ $_ } = 1;
}

while ( <$fh1> ) { 
    chomp;
    next if exists $phone{ $_ };
    ##printf $ofh qq|%s\n|, $_;
    printf qq|%s\n|, $_;
}

exit 0;

Run it like:

perl script.pl file1 file2 > outfile



回答3:


Whenever you get a is one piece of data in one group in another group type question (and they come up quite a bit, you should think in terms of hashes.

A hash is a keyed lookup. Let's say you create a hash keyed on say... I don't know... phone numbers taken from file #1. If you read a line in file #2, you can easily see if it's in file #1 by simply looking at the hash. Fast, efficient.

use strict;   #ALWAYS ALWAYS ALWAYS
use warnings; #ALWAYS ALWAYS ALWAYS

use autodie;  #Will end the program if files you try to open don't exist

# Constants are a great way of storing data that is ...uh... constant
use constant {
    FILE_1    =>  "a1.txt",
    FILE_2    =>  "a2.txt",
};

my %phone_hash;

open my $phone_num1_fh, "<", FILE_1;

#Let's build our phone number hash
while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash{ $phone_num } = 1;   #Doesn't really matter, but best not a zero value
}
close $phone_num1_fh;

#Now that we have our phone hash, let's see if it's in file #2
open my $phone_num2_fh, "<", FILE_2;
while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    if ( exists $phone_hash { $phone_num } ) {
        print "$phone_num is in file #1 and file #2";
    }
    else {
        print "$phone_num is only in file #2";
    }
}

See how nicely that works. The only issue is that there may be phone numbers in file #1 that aren't in file #2. You could solve this by simply creating a second hash for all the phone numbers in file #2.

Let's do this one more time with two hashes:

my %phone_hash1;
my %phone_hash2;

open my $phone_num1_fh, "<", FILE_1;

while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash1{ $phone_num } = 1;
}
close $phone_num1_fh;

open my $phone_num2_fh, "<", FILE_2;

while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    $phone_hash2{ $phone_num } = 1;
}
close $phone_num1_fh;

Now, we'll use keys to list the keys and go through them. I'm going to create an %in_common hash when the phone is in both hashes

my %in_common;

for my $phone ( keys %phone_hash1 ) {
    if ( $phone_hash2{$phone} ) { 
       $in_common{$phone} = 1;    #Phone numbers in common between the two lists
    }
}

Now, I have three hashes %phone_hash1, %phone_hash2, and %in_common.

for my $phone ( sort keys %phone_hash1 ) {
    if ( not $in_common{$phone} ) {
         print "Phone number $phone is only in the first file\n";
    }
}

for my $phone ( sort keys %phone_hash2 ) {
    if ( not $in_common{$phone} ) {
        print "Phone number $phone is only in " . FILE_2 . "\n";
    }
}

for my $phone ( sort keys %in_common ) {
    print "Phone number $phone is in both files\n";
}

Note in this example, I didn't use the exists to see if the key exists in the hash. That is, I simply put if ( $phone_hash2{$phone} ) instead of if ( exists $phone_hash2{$phone} ). The first form checks to see if the key is defined -- even if the value is a null string or numerically zero.

The second form will be true as long as the value is not zero, a null string, or undefined. Since I purposefully set the value of the hash to 1, I can use this form. It's a good habit to use exists because there will be a situation where a valid value could be a null string or zero. However, some people like the way the code reads without using the exists when possible.



来源:https://stackoverflow.com/questions/16742013/compare-file-lines-for-match-anywhere-in-second-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!