This question is not Perl-specific, (although the unpack function will most probably figure into my implementation).
I have to deal with files where multipl
Toying with an answer to your question, I arrived at an interesting solution with a concise main loop:
while (<>) {
given($_) {
when (@{[ map $pattern{$_}, @expect]}) {}
default {
die "$0: line $.: expected " . join("|" => @expect) . "; got\n$_";
}
}
}
As you'll see below, %pattern is a hash of named patterns for the different formats, and given/when against an array of Regex objects performs a short-circuiting search to find the first match.
From this, you can infer that @expect is a list of names of formats we expect to find on the current line.
For a while, I was stuck on the case of multiple possible expected formats and how to know format just matched, but then I remembered (?{ code }) in regular expressions:
This zero-width assertion evaluates any embedded Perl code. It always succeeds, and its code is not interpolated.
This allows something like a poor man's yacc grammar. For example, the pattern to match and process format 1 is
fmt1 => qr/^ \*\* DEVICE \s+ (\S+) \s*$
(?{ $device->{attr1} = $1;
@expect = qw< fmt2 >;
})
/x,
After processing the input from your question, $device contains
{
'attr1' => '109523.69142',
'attr2' => '.981',
'attr3' => '561A',
'groups' => [
{
'date' => '10/MAY/2010',
'nnn' => [ '24.15.30', '13.45.03' ],
'records' => [
[ '05:03:01', 'AB23X', '15.67', '101325.72', '14', '31.30474', '13', '0' ],
[ '05:03:15', 'CR22X', '16.72', '101325.42', '14', '29.16264', '11', '0' ],
[ '06:23:51', 'AW41X', '15.67', '101323.9', '14', '31.264932', '19', '0' ],
],
},
{
'date' => '11/MAY/2010',
'nnn' => [ '24.07.13', '13.44.63' ],
'records' => [
[ '15:57:14', 'AB23X', '15.67', '101327.23', '14', '31.30474', '13', '0' ],
[ '15:59:59', 'CR22X', '16.72', '101331.88', '14', '29.16264', '11', '0' ],
],
}
],
}
I'm amused with the result, but for some reason Larry's advice in perlstyle comes to mind:
Just because you CAN do something a particular way doesn't mean that you SHOULD do it that way.
For completeness, a working program demonstrating the result is below.
#! /usr/bin/perl
use warnings;
use strict;
use feature ':5.10';
use re 'eval';
*ARGV = *DATA;
my $device;
my $record;
my @expect = qw/ fmt1 /;
my %pattern;
%pattern = (
fmt1 => qr/^ \*\* DEVICE \s+ (\S+) \s*$
(?{ $device->{attr1} = $1;
@expect = qw< fmt2 >;
})
/x,
fmt2 => qr/^ \s* (\S+) \s+ (\S+) \s*$
(?{ @{$device}{qw< attr2 attr3 >} = ($1,$2);
@expect = qw< fmt3 >;
})
/x,
# e.g., 10/MAY/2010 24.15.30,13.45.03
fmt3 => qr/^ (\d\d\/[A-Z]{3}\/\d{4}) \s+ (\S+) \s*$
(?{ my($date,$nnns) = ($1,$2);
push @{ $device->{groups} } =>
{ nnn => [ split m|,| => $nnns ],
date => $date };
@expect = qw< fmt4 >;
})
/x,
# e.g., 05:03:01 AB23X 15.67 101325.72
fmt4 => qr/^ (\d\d:\d\d:\d\d) \s+
(\S+) \s+ (\S+) \s+ (\S+)
\s*$
(?{ push @{ $device->{groups}[-1]{records} } =>
[ $1, $2, $3, $4 ];
@expect = qw< fmt4 fmt5 >;
})
/x,
# e.g., * 14 31.30474 13 0
fmt5 => qr/^\* \s+ (\d+) \s+
# tricky: possibly no whitespace after 9-char float
((?=\d{1,7}\.\d+)[\d.]{1,9}) \s*
(\d+) \s+ (\d+)
\s*$
(?{ push @{ $device->{groups}[-1]{records}[-1] } =>
$1, $2, $3, $4;
@expect = qw< fmt4 fmt3 fmt2 >;
})
/x,
);
while (<>) {
given($_) {
when (@{[ map $pattern{$_}, @expect]}) {}
default {
die "$0: line $.: expected " . join("|" => @expect) . "; got\n$_";
}
}
}
use Data::Dumper;
$Data::Dumper::Terse = $Data::Dumper::Indent = 1;
print Dumper $device;
__DATA__
**DEVICE 109523.69142
.981 561A
10/MAY/2010 24.15.30,13.45.03
05:03:01 AB23X 15.67 101325.72
* 14 31.30474 13 0
05:03:15 CR22X 16.72 101325.42
* 14 29.16264 11 0
06:23:51 AW41X 15.67 101323.9
* 14 31.26493219 0
11/MAY/2010 24.07.13,13.44.63
15:57:14 AB23X 15.67 101327.23
* 14 31.30474 13 0
15:59:59 CR22X 16.72 101331.88
* 14 29.16264 11 0