Transform/pivot array in perl

a 夏天 提交于 2019-12-11 16:36:33

问题


Im stuck writing Perl code which transforms 2d array.

  • First column of the array is always date
  • Second column of the array is key that sorts.
  • Data is located in array "data" and is ordered by date and then key.

  • My situation should be understandable from the tables under. Unique values from the second column will be selected and later divided into columns header (green table)

It should work with and number of columns or dates/keys.

Structure before

Structure after

My code:
#creates filtered array of all the unique dates and its count
my @date          = @{ $data->[0] };
my @filtDate      = uniq @date;
my $countFiltDate = scalar @filtDate;
#unique list of keys
my @klic     = @{ $data->[1] };
my @filtKlic = uniq @klic;
#orders filtered keys
@filtKlic = sort @filtKlic;
my $countFiltKlic = scalar @filtKlic;
#count of columns
my $columnsCount = scalar @{$data};
#test code - counts how many new number columns to make.
my $columnsCountAfter = ( $columnsCount - 2 ) * $countFiltKlic;

#inserst filtered dates into first column
my $dataGraph;
for ( my $i = 0; $i < $countFiltDate; $i++ ) {
    $dataGraph->[0]->[$i] = @filtDate[$i];
}

#biggest loop with number of dates
for ( my $k = 0; $k < $countFiltDate; $k++ ) {
    my $l;
    my $c;
    #columns sount k $i
    for ( my $i = 0; $i < $columnsCount - 2; $i++ ) {
        #loop for different keys k $j
        for ( my $j = 0; $j < $countFiltKlic; $j++ ) {
            $l++;    #riadok v prvej tabulke
                     #EVERYTHING after this part is written horibly.
                     # I'm trying to make it work even
                     #if key values are missing.
            for ( my $m = 0; $m < 5; $m++ ) {
                if ( $data->[1]->[ $l - 1 ] eq $filtKlic[$j] ) {
                    print " [" . $data->[1]->[ ( $l - 1 ) ] . ',' . $filtKlic[$j] . "]";
                    $dataGraph->[ $l + $c ]->[$k] = $data->[ $i + 2 ]->[ ( ( $k * $countFiltKlic ) + $j ) ];
                    #print " [".$data->[1]->[$j].','.($filtKlic[($j)])."]-";
                    print " [" . ( $i + 2 ) . ',' . ( ( $k * $countFiltKlic ) + $j ) . "]-";
                    print " [" . ( $l + $c ) . ',' . $k . "]<br>";
                    $m = 5;    #just random number... i don't want to get infinite loops during testing

                } else {
                    if ( $m = 5 ) {
                        $l--;
                        $c++;
                    }
                    $j++;
                }
            }
        }
    }
}

my @nameStlpceKlic;
@nameStlpceKlic[0] = "date";
my $o;
for ( my $i = 0; $i < $columnsCount - 2; $i++ ) {
    foreach (@filtKlic) {
        my $o;
        $o++;
        @nameStlpceKlic[$o] = @filtKlic[ ( $o - 1 ) ] . $i;
    }
}

I have 2 problems.

  1. How to make sure that this will work even if some of the key are missing at some dates.
  2. How to write it properly. My code is too clumsy.

回答1:


Here is my general approach for solving this kind of problem.

In the second table, you're grouping your data by the date, then displaying the values for number1 and the values for number2. This should give you a hint as to how you want to organise your data structure and what you need to index for printing.

Your current data is (I assume) stored in an array of arrays. I was too lazy to copy the values, so I made my own AoA with made up values. I've sprinkled comments through the code so you can see how I worked on this.

my $arr = [
  ['date','key','number1','number2'],
  ['22.12.2013','1a','1a1-34567','1a2-1234567'],
  ['22.12.2013','2b','2b1-3249871','2b2-4597134'],
  ['22.12.2013','3c','3c1-1234567',''],
  ['22.12.2013','4d','4c1-3249871','4c2-4597134'],
  ['22.13.2013','1a','1a1-34567','1a2-1234567'],
  ['22.13.2013','2b','','2b2-4597134'],
  ['22.13.2013','3c','3c1-1234567','3c2-1234567'],
  ['22.13.2013','4d','4c1-3249871','4c2-4597134'],
];

# first, remove the first row, which contains the column headers.
my $col_h = shift @$arr;

my $data;
my $key_list;
foreach (@$arr) {
    my %hash;
    # use a hash slice with the column header array as keys
    # and the array as the values
    @hash{@$col_h} = @$_;
    # store this hash in a data hash indexed by date then key
    $data->{ $hash{date} }{ $hash{key} } = \%hash;
    # compile a separate hash with the keys in it
    $key_list->{ $hash{key} }++;
}

# make a sorted list of keys, ready for printing
my @key_list = sort keys %$key_list;

# remove the first two elements from the column headers ('date' and 'key')
splice(@$col_h, 0, 2);

# print out the header row for the table (I'm doing a simple tab-delim'd table)
print STDERR "Date\t";
# for each NUMBER from NUMBER1 ... NUMBERn
foreach my $c (@$col_h) {
    # print "keyID NUMBERn"
    map { print STDERR "$_ $c\t" } @key_list;
}
print STDERR "\n";

# Now print out the data itself. Sort by date...
foreach my $date (sort keys %$data) {
    print STDERR "$date\t";
    # for each NUMBER1 ... NUMBERn
    foreach my $header (@$col_h) {
        foreach my $key (@key_list) {
            ## print out the value OR - if there is no value
            print STDERR ( $data->{$date}{$key}{$header} || "-" ) . "\t";
        }
    }
    print STDERR "\n"; # end of the table row
}

Output (with tabs expanded for display purposes):

Date        1a number1  2b number1  3c number1  4d number1  1a number2  2b number2  3c number2  4d number2  
22.12.2013  1a1-34567   2b1-3249871 3c1-1234567 4c1-3249871 1a2-1234567 2b2-4597134 -           4c2-4597134 
22.13.2013  1a1-34567   -           3c1-1234567 4c1-3249871 1a2-1234567 2b2-4597134 3c2-1234567 4c2-4597134



回答2:


I was able to put together code that works using great answer from "i alarmed alien" . First thing that is different is that my data are formatted as array of arrays in transposed way.

$arr1 = [ '2013-12-22', '2013-12-22' ]; 
$arr2 = [ 'Number1','Number2']; 
$arr3 = [ '2328942', '679204']; 
$arr4 = [ '1450398', '436713']; 

Also transformed data should be saved in an array. I've written this piece of code. ( It's far from perfect, if there are any suggestions how to improve it further I'd be happy to hear those.)

####################
#transpose data 
my $datas = $args{DATA};
my $headers = $args{HEADERS};
my @rows = ();
my @transposed = ();
for my $row (@$datas) {
  for my $column (0 .. $#{$row}) {
    push(@{$transposed[$column]}, $row->[$column]);
  }
}

#################################
my @arr = @transposed;
# first, define headers.
my $col_h = $args{HEADERS};
my $data;
my $key_list;
foreach (@arr) {
    my %hash;
    # use a hash slice with the column header array as keys
    # and the array as the values
    @hash{@$col_h} = @$_;
    # store this hash in a data hash indexed by date then key
    $data->{ $hash{date} }{ $hash{key} } = \%hash;
    # compile a separate hash with the keys in it
    $key_list->{ $hash{key} }++;
}
# make a sorted list of keys, ready for printing
my @key_list = sort keys %$key_list;
# remove the first two elements from the column headers ('date' and 'key')
splice(@$col_h, 0, 2);

my @output;
my @header;
# print out the header row for the table (I'm doing a simple tab-delim'd table)
#print STDERR "Date\t";
push(@header, "Date\t");
# for each NUMBER from NUMBER1 ... NUMBERn
foreach my $c (@$col_h) {
    # print "keyID NUMBERn"
    map { push (@header,"$_ $c\t" )} @key_list;
    #map { print STDERR "$_ $c\t" } @key_list;
}
#print STDERR "<br>";
push (@output,\@header );
my $row;
my $column;
# Now print out the data itself. Sort by date...
foreach my $date (sort keys %$data) {
    #print STDERR "$date\t";
    $row++;
    my @line;
    push(@line, "$date");
    # for each NUMBER1 ... NUMBERn
    foreach my $header (@$col_h) {  
        foreach my $key (@key_list) {
            ## print out the value OR - if there is no value
            $column++;
            push (@line,( $data->{$date}{$key}{$header} || "-" ) . "\t");
            #print STDERR ( $data->{$date}{$key}{$header} || "-" ) . "\t";
        }
    }
    print STDERR "<br>"; # end of the table row
    $column = 0;    
    push (@output,\@line );
}

    my $x = 1;
   return @output;

}   

This code works but it's little ugly. Please let me know If there is cleaner/better way to do this.



来源:https://stackoverflow.com/questions/25746610/transform-pivot-array-in-perl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!