How to compare two text files and removing the matching contents and pass to output in perl?

天大地大妈咪最大 提交于 2019-12-12 19:55:00

问题


I have two text files text1.txt and text2.txt like below

text1

    ac
    abc
    abcd
    abcde

text2

    ab
    abc
    acd
    abcd

output

ac
abcde

I need to compare the two files and remove the content from text1 when there is a match in the second file.

I want the code in Perl. Currently I am trying the below code.

#!usr/bin/perl
use strict;
use warnings;

open (GEN, "text1.txt") || die ("cannot open general.txt");
open (SEA, "text2.txt") || die ("cannot open search.txt");
open (OUT,">> output.txt") || die ("cannot open intflist.txt");
open (LOG, ">> logfile.txt");

undef $/;
foreach (<GEN>) {

  my $gen = $_;
  chomp ($gen);
  print LOG $gen;

  foreach (<SEA>) {

    my $sea = $_;
    chomp($sea);
    print LOG $sea;

    if($gen ne $sea) {
      print OUT $gen;
    }
  }
}

In this I am getting all content from text1, not the unmatched content. Please help me out.


回答1:


I think you should read the text2 in an array and then in the second foreach on that array use the array.

@b = <SEA>;

Or else in the second loop the file pointer would be at the end already




回答2:


One way:

#!/usr/bin/perl
use strict;
use warnings;

$\="\n";

open my $fh1, '<', 'file1' or die $!;
open my $fh2, '<', 'file2' or die $!;
open my $out, '>', 'file3' or die $!;

chomp(my @arr1=<$fh1>);
chomp(my @arr2=<$fh2>);

foreach my $x (@arr1){
        print $out $x if (!grep (/^\Q$x\E$/,@arr2));
}

close $fh1;
close $fh2;
close $out;

After executing the above, the file 'file3' contains:

$ cat file3
ac
abcde



回答3:


This is my plan:

  1. Read the contents of first file in a hash, with a counter of occurrences. For example, working with your data you get:

    %lines = ( 'ac' => 1,
        'abc' => 1,
        'abcd' => 1,
        'abcde' => 1);
    
  2. Read the second file, deleting the previous hash %lines if key exists.

  3. Print the keys %lines to the desired file.

Example:

 use strict;

 open my $fh1, '<', 'text1' or die $!;
 open my $fh2, '<', 'text2' or die $!;
 open my $out, '>', 'output' or die $!;
 my %lines = ();

 while( my $key = <$fh1> ) {
    chomp $key;
    $lines{$key} = 1;
 }

 while( my $key = <$fh2> ) {
    chomp $key;
    delete $lines{$key};
 }

 foreach my $key(keys %lines){
    print $out $key, "\n";
 }

 close $fh1;
 close $fh2;
 close $out;



回答4:


Your main problem is that you have undefined the input record separator $/. That means the whole file will be read as a single string, and all you can do is say that the two files are different.

Remove undef $/ and things will work a whole lot better. However the inner for loop will read and print all the lines in file2 that don't match the first line of file1. The second time this loop is encountered all the data has been read from the file so the body of the loop won't be executed at all. You must either open file2 inside the outer loop or read the file into an array and loop over that instead.

Then again, do you really want to print all lines from file2 that aren't equal to each line in file1?

Update

As I wrote in my comment, it sounds like you want to output the lines in text1 that don't appear anywhere in text2. That is easily achieved using a hash:

use strict;
use warnings;

my %exclude;

open my $fh, '<', 'text2.txt' or die $!;
while (<$fh>) {
  chomp;
  $exclude{$_}++;
}

open $fh, '<', 'text1.txt' or die $!;
while (<$fh>) {
  chomp;
  print "$_\n" unless $exclude{$_};
}

With the data you show in your question, that produces this output

ac
abcde



回答5:


I would like to view your problem like this:

  • You have a set S of strings in file.txt.
  • You have a set F of forbidden strings in forbidden.txt.
  • You want the strings that are allowed, so S \ F (setminus).

There is a data structure in Perl that implements a set of strings: The hash. (It can also map to scalars, but that is secondary here).

So first we create the set of the lines we have. We let all the strings in that file map to undef, as we don't need that value:

open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my %Set = map {$_ => undef} <$FILE>;

We create the forbidden set the same way:

open my $FORBIDDEN, "<", "forbidden.txt" or die "Can't open forbidden.txt: $!";
my %Forbidden = map {$_ => undef} <$FORBIDDEN>;

The set minus works like either of these ways:

  • For each element x in S, x is in the result set R iff x isn't in F.

    my %Result = map {$_ => $Set{$_}} grep {not exists $Forbidden{$_}} keys %Set;
    
  • The result set R initially is S. For each element in F, we delete that item from R:

    my %Result = %Set; # make a copy
    delete $Result{$_} for keys %Forbidden;
    

(the keys function accesses the elements in the set of strings)

We can then print out all the keys: print keys %Result.

But what if we want to preserve the order? Entries in a hash can also carry an associated value, so why not the line number? We create the set S like this:

open my $FILE, "<", "file.txt" or die "Can't open file.txt: $!";
my $line_no = 1;
my %Set = map {$_ => $line_no++} <$FILE>;

Now, this value is carried around with the string, and we can access it at the end. Specifically, we sort the keys in the hash after their line number:

my @sorted_keys = sort { $Result{$a} <=> $Result{$b} } keys %Result;
print @sorted_keys;

Note: All of this assumes that the files are terminated by newline. Else, you would have to chomp.



来源:https://stackoverflow.com/questions/14723333/how-to-compare-two-text-files-and-removing-the-matching-contents-and-pass-to-out

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!