Separating an Output with a Tab / Space : Perl

陌路散爱 提交于 2019-12-22 09:49:58

问题


I am working with three text documents . The first one is the main input (Input 1) with words and the word type (Noun , Verb etc.) separated by a tab.

Input 1

John    N
goes    V
to      P
school  N
.       S
Mary    N
comes   V
from    P
home    N
.       S

The second and third input text files look like this :

Input 2

John
Mary

Input 3

to
from

My objective is to compare and match the second and third text files with the main input and get a output like this :

Expected output:

John    N   N
goes    V
to      P   P
school  N
.       S
Mary    N   N
comes   V
from    P   P
home    N
.       S

So, basically, all the three columns should be separated by tab or space. However, I am getting an output like this :

John N  
N
goes    
V
to P    
P
school  
N
.   
S
Mary N  
N
comes   
V
from P  
P
home    
N
.   
S

I believe this is happening as I took the input of the first text file into an array and printed the values.Please suggest me a way to get the desired output. The program coding that I have used is below :

#!/usr/bin/perl

use warnings;
use strict;

my @file = ('Input 1.txt');

open my $word_fh, '<', 'Input 2.txt' or die $!;
open my $word2_fh, '<', 'Input 3.txt' or die $!;

my %words_to_match = map {chomp $_; $_ => 0} <$word_fh>;
my %words_to_match2 = map {chomp $_; $_ => 0} <$word2_fh>;

close $word_fh;
close $word2_fh;

check($_) for @file;

sub check {
    my $file = shift;

open my $fh, '<', $file or die $!;

while (<$fh>){
    chomp;
    my @words_in_line = split;

    for my $word (@words_in_line){
        $word =~ s/[(\.,;:!)]//g;
        $word .= '  N' if exists $words_to_match{$word};
        $word .= '  P' if exists $words_to_match2{$word};

        print "$word\n";
    }
    print "\n";
}

Again ,the objective is to have an output with all the three columns separated by tab or space. Any hints or suggestions is appreciated. Thanks in advance.


回答1:


You are outputting an unnecessary newline, and you are constructing your new output line incorrectly. There is no need to search your hashes for the "type" column. This produces the desired output.

use warnings;
use strict;

my @file = ('Input 1.txt');

open my $word_fh,  '<', 'Input 2.txt' or die $!;
open my $word2_fh, '<', 'Input 3.txt' or die $!;

my %words_to_match  = map { chomp $_; $_ => 0 } <$word_fh>;
my %words_to_match2 = map { chomp $_; $_ => 0 } <$word2_fh>;

close $word_fh;
close $word2_fh;

check($_) for @file;

sub check {
    my $file = shift;
    open my $fh, '<', $file or die $!;
    while (<$fh>) {
        chomp;
        my ($word, $type) = split;
        my $line = $_;
        $line .= '  N' if exists $words_to_match{$word};
        $line .= '  P' if exists $words_to_match2{$word};
        print "$line\n";
    }
}



回答2:


It makes things a lot easier if you read all your reference files and build data structures from them first, and then read your primary input file and transform it

You're using two hashes, %words_to_match and %words_to_match2 and storing every element with a value of zero. That's a waste of information, and the best thing here is to build a single hash that relates the words in each reference file to their part of speech. The words in Input 2.txt are nouns, so they get an N, while those in Input 3.txt are prepositions, so they get a P

Then you just have to check to see whether there exists a hash element that matches each word in Input 1.txt and append its value before printing the record if so

The program below creates a hash %pos looking like this, which relates every word in the two reference files to its part of speech

( from => "P", John => "N", Mary => "N", to => "P" )

and in the final input loop I've used a subtitution s/// to replace all trailing whitespace (which includes newlines) with three spaces and the part of speech. Tabs aren't useful things for laying out tables, firstly because no one can agree where the tab stops should be, and secondly because a single tab won't always line up columns. Depending how many characters there were in the preceding data, you may sometimes need two or more

I hope it's clear

use strict;
use warnings 'all';
use autodie;

my %words;

my %files = (
    'input 2.txt' => 'N',   
    'input 3.txt' => 'P',   
);

while ( my ( $file, $pos ) = each %files ) {
    open my $fh, '<', $file;

    while ( <$fh> ) {
        s/\s+\z//;
        $words{$_} = $pos;
    }
}

{
    open my $fh, '<','input 1.txt';

    while ( <$fh> ) {
        next unless /\S/;
        my ($word) = split;
        my $pos = $words{$word};
        s/\s+\z/   $pos\n/ if $pos;
    }
    continue {
        print;
    }
}

output

John    N   N
goes    V
to      P   P
school  N
.       S
Mary    N   N
comes   V
from    P   P
home    N
.       S



回答3:


The problem is this:

my @words_in_line = split;

for my $word (@words_in_line){
    ...
}

What you want to do is look at the first word in the line, see if it matches any of your %words_to_match variables, and if it does then append the N or P to the entire line.

Right now you're looking at each word in the line, instead of just the first one. Then you're appending the N and P to the word itself, instead of to the whole line.

Here's what the correct pseudocode would look like:

# get the first word in the line
# if it matches `%words_to_match` then append the `  N` to the entire line
# if it matches `%words_to_match2` then append the `  P` to the entire line
# print the line

I got this pseudocode from taking the first paragraph in my answer, and breaking it down into pieces.

Anyways, in Perl it looks like this:

my ($first_word) = split;

$_ .= '  N' if exists $words_to_match{$first_word};
$_ .= '  P' if exists $words_to_match2{$first_word};

print "$_\n";


来源:https://stackoverflow.com/questions/37974486/separating-an-output-with-a-tab-space-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!