Perl Scripting | using HASHes

问题

I have an input file which looks like below

 =IP1
abc[0]
abc[1]
abc[2]
=IP2
def[4]
def[8]
def[9]

I need to get the output in the below format -

=IP1
abc[0-2]
=IP2
def[4,8-9]

I have been trying to achieve the above using hashes where I read each line of the file and then split(with'[') each line, I keep the first part as key and read the file again to keep the values in an array for the hash keys. But I am getting stuck in a loop. Can anyone provide help on how to achieve the above ?

回答1:

There are several interesting subproblems. First, you want to keep track of the most recent header (ie, =IP1). Second, you want to keep track of lists of numbers that are associated with some keys, and third, you want to generate range strings.

Here's how I would do it:

#!/usr/bin/env perl

use strict;
use warnings;

my $tl;
my %h;

# First process the lines of the input file.
while(<DATA>) {
    chomp;
    next unless length;
    if(/^(=\w{2}\d+)$/) { # Recognize and track a top level heading.
        $tl = $1;
        next;
    }
    if(/^(\w+)\[(\d+)\]$/) {  # Or grab a key/value pair.
        my($k,$v) = ($1,$2);
        push @{$h{$tl}{$k}}, $v; # push the value into the right bucket.
        next;
    }
    warn "Unrecognized format cannot be processed at $.: (($_))\n";
}

# Sort the top level headers alphabetically and numerically.
# Uses a Schwartzian Transform so that we don't need to recompute
# sort keys repeatedly.
my @topkeys = map  {$_->[0]}
              sort {$a->[1] cmp $b->[1] || $a->[2] <=> $b->[2]} 
              map  {
                my($alpha, $num) = $_ =~ m/^=(\w+)(\d+)$/; 
                [$_, $alpha, $num]
              } keys %h;

# Now iterate through the structure in sorted order, generate range
# strings on the fly, and print our output.
foreach my $top (@topkeys) {
    print "$top\n";
    foreach my $k (sort keys %{$h{$top}}) {
        my @vl = sort {$a <=> $b} @{$h{$top}{$k}};
        my $range = num2range(@vl);
        print "$k\[$range]\n";
    }
}

sub num2range {
  local $_ = join ',' => @_;
  s/(?<!\d)(\d+)(?:,((??{$++1}))(?!\d))+/$1-$+/g;
  return $_;
}

__DATA__
=IP1
abc[0]
abc[1]
abc[2]
=IP2
def[4]
def[8]
def[9]

The following output is produced:

=IP1
abc[0-2]
=IP2
def[4,8-9]

This solution could be optimized further if answers to some of the questions that Borodin asked as a comment to your original post were answered. For example, it would be unnecessary to sort our number list before generating a range if we knew that the numbers were already in order. And some complexity (and computational work) might be eliminated if we knew more about what "abc" and "def" are. And if sorted order doesn't matter, we could simplify further while also reducing the amount of work being done.

Also, the Set::IntSpan module could probably provide a more robust approach to generate a range string, and is probably worth considering if this script is intended to live beyond the "one off" lifespan. If you choose to use Set::IntSpan your num2range sub could look like this:

sub num2range{ return Set::IntSpan->new(@_) }

The Set::IntSpan object has overloaded stringification, so printing it gives a text representation of the range. If you went this route, you could eliminate the code that sorts the lists of numbers -- that's handled by Set::IntSpan internally.

回答2:

Okay, here's my take on a solution. Without any better information on the incoming data it may be more complicated than necessary

It keeps the data -- both the =IP headers and the xyz[9] values in the same order that they are first encountered. I've separated out the generation of the number range contraction to subroutine ranges

It's simply a matter of reading the data in from the file -- which it expects as a parameter on the command line -- into data structures %data and @order and printing them out again. The @order array and the _order subkey of the hash are there to preserve the sequence that the values are encountered and are added to whenever a new key is inserted into the corresponding hash

use strict;
use warnings;

my ($key, %data, @order);

while ( <> ) {

  chomp;

  if ( /^=/ ) {
    $key = $_;
    push @order, $key unless $data{$key};
    $data{$key} = { _order => [] };
  }
  elsif ( my ($key2, $n) = /([^\[\]\s]+)\[(\d+)\]/ ) {
    my $data = $data{$key};
    push @{ $data->{_order} }, $key2 unless $data->{$key2};
    push @{ $data->{$key2} }, $n; 
  }
}

for my $key ( @order ) {

    print $key, "\n";

    my $data = $data{$key};

    for my $key2 ( @{ $data->{_order} } ) {
        printf "%s[%s]\n", $key2, ranges( sort { $a <=> $b } @{ $data->{$key2} } );
    }

}

sub ranges {

    my @ranges;
    my ($start, $end);

    for my $n ( @_ ) {
        if ( not defined $start ) {
            $start = $end = $n;
        }
        elsif ( $n == $end + 1 ) {
            $end = $n;
        }
        else {
            push @ranges, $start == $end ? $start : "$start-$end";
            $start = $end = $n;
        }
    }

    push @ranges, $start == $end ? $start : "$start-$end" if defined $start;
    join ',', @ranges;
}

output

=IP1
abc[0-2]
=IP2
def[4,8-9]

来源：https://stackoverflow.com/questions/31779141/perl-scripting-using-hashes

标签

perl

hash