问题
I have two text files.
The first one has a list of words, like the following:
File 1.txt
Laura
Samuel
Gerry
Peter
Maggie
The second one has paragraphs on it. For example
File2.txt
Laura
is
about
to
meet
Gerry
and
is
planning
to
take
Peter
along
All I want the program to do is look for common words and print MATCH
beside the matching words in File2.txt
or to a third output file.
So the desired output should look like this.
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
I have tried the following code, however I am not getting the desired output.
use warnings;
use strict;
use Data::Dumper;
my $result = { };
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( OUTPUT, '>' . $output ) or die "Cannot create $output.\n";
while ( <$a_fh> ) {
chomp;
next if /^$/;
$result->{$_}++;
}
while ( <$b_fh> ) {
chomp;
next if /^$/;
if ( $result->{$_} ) {
delete $result->{$_};
$result->{ join " |" => $_, "MATCH" }++;
}
else {
$result->{$_}++;
}
}
{
$Data::Dumper::Sortkeys = 0;
print OUTPUT Dumper $result;
}
But the output that I am getting is like this.
Laura | MATCH
Samuel | MATCH
take
Maggie | MATCH
Laura
about
to
Gerry
meet
Gerry | MATCH
and
is
Maggie |MATCH
planning
to
Peter |MATCH
take
Peter |MATCH
The output is not in a paragraph format, nor is it printing MATCH
for all matches.
Please advise.
回答1:
Here's one example, which allows doing multiple files. I populate an array @files
with the files I want to compare, then I read in the wordlist file and put them all into a hash, then iterate over the paragraph files one line at a time. I then separate all the words on each line, and print them, but only after checking whether the word is in wordlist. If it is, I print it with " | MATCH".
Paragraph file 1:
Laura is about to meet Gerry, and is planning to take Peter along.
But Peter and Sarah have other plans.
Paragraph file 2:
Blah Peter has lost it.
The code:
use warnings;
use strict;
my @files = ('file.txt', 'file2.txt');
open my $word_fh, '<', 'wordlist.txt' or die $!;
my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;
close $word_fh;
check($_) for @files;
sub check {
my $file = shift;
open my $fh, '<', $file or die $!;
while (<$fh>){
chomp;
my @words_in_line = split;
for my $word (@words_in_line){
$word =~ s/[\.,;:!]//g;
$word .= ' | MATCH' if exists $words_to_match{$word};
print " $word\n";
}
print "\n";
}
}
Output:
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
But
Peter | MATCH
and
Sarah
have
other
plans
Blah
Peter | MATCH
has
lost
it
If you want to print it to a file, open a write file handle, and change the print
statement inside the while loop to print $wfh ...
.
回答2:
I think you didn't get the desired output because you stuffed it into the hash $result
and then printed that with Data::Dumper
.
Data::Dumper
will print a hash in an arbitrary order, especially if you set $Data::Dumper::Sortkeys=0
.
I changed your code a bit so the output is written as soon as it's read from File2.txt and when it's clear whether there was a match or not.
#!/usr/bin/env perl
use strict;
use warnings;
my $result = {};
my $first_file = shift || 'File1.txt';
my $second_file = shift || 'File2.txt';
my $output = 'output2.txt';
open my $a_fh, '<', $first_file or die "$first_file: $!";
open my $b_fh, '<', $second_file or die "$second_file: $!";
open( my $out, '>', $output ) or die "Cannot create $output.\n";
# remember words from File1.txt in the hash $result:
while ( my $line = <$a_fh> ) {
$line =~ s/^\s*//; # strip leading whitespace
$line =~ s/\s*$//; # strip trailing ws
next if $line =~ /^$/; # skip now empty lines
$result->{$line} = 1;
}
# now $result consists of all "words" in File1.txt, like
# $result = {
# 'Gerry' => 1,
# 'Laura' => 1,
# 'Maggie' => 1,
# 'Peter' => 1,
# 'Samuel' => 1
# };
# now iterate over File2.txt, print all lines and append
# 'MATCH' for those in File1.txt:
while ( my $line = <$b_fh> ) {
$line =~ s/^\s*//; # strip leading whitespace
$line =~ s/\s*$//; # strip trailing ws
next if $line =~ /^$/; # skip now empty lines
# print the line from File2.txt (without \n):
print $out $line;
# if this line (word) was found
# in File1.txt, then append " | MATCH"
if ( $result->{$line} ) {
print $out ' | MATCH';
}
# print final \n
print $out "\n";
}
Output:
Laura | MATCH
is
about
to
meet
Gerry | MATCH
and
is
planning
to
take
Peter | MATCH
along
来源:https://stackoverflow.com/questions/37899315/perl-program-to-find-matching-words-in-a-paragraph