An efficient way to transpose a file in Bash

前端 未结 29 2489
时光说笑
时光说笑 2020-11-22 03:30

I have a huge tab-separated file formatted like this

X column1 column2 column3
row1 0 1 2
row2 3 4 5
row3 6 7 8
row4 9 10 11

I would like t

29条回答
  •  独厮守ぢ
    2020-11-22 03:39

    Here is a moderately solid Perl script to do the job. There are many structural analogies with @ghostdog74's awk solution.

    #!/bin/perl -w
    #
    # SO 1729824
    
    use strict;
    
    my(%data);          # main storage
    my($maxcol) = 0;
    my($rownum) = 0;
    while (<>)
    {
        my(@row) = split /\s+/;
        my($colnum) = 0;
        foreach my $val (@row)
        {
            $data{$rownum}{$colnum++} = $val;
        }
        $rownum++;
        $maxcol = $colnum if $colnum > $maxcol;
    }
    
    my $maxrow = $rownum;
    for (my $col = 0; $col < $maxcol; $col++)
    {
        for (my $row = 0; $row < $maxrow; $row++)
        {
            printf "%s%s", ($row == 0) ? "" : "\t",
                    defined $data{$row}{$col} ? $data{$row}{$col} : "";
        }
        print "\n";
    }
    

    With the sample data size, the performance difference between perl and awk was negligible (1 millisecond out of 7 total). With a larger data set (100x100 matrix, entries 6-8 characters each), perl slightly outperformed awk - 0.026s vs 0.042s. Neither is likely to be a problem.


    Representative timings for Perl 5.10.1 (32-bit) vs awk (version 20040207 when given '-V') vs gawk 3.1.7 (32-bit) on MacOS X 10.5.8 on a file containing 10,000 lines with 5 columns per line:

    Osiris JL: time gawk -f tr.awk xxx  > /dev/null
    
    real    0m0.367s
    user    0m0.279s
    sys 0m0.085s
    Osiris JL: time perl -f transpose.pl xxx > /dev/null
    
    real    0m0.138s
    user    0m0.128s
    sys 0m0.008s
    Osiris JL: time awk -f tr.awk xxx  > /dev/null
    
    real    0m1.891s
    user    0m0.924s
    sys 0m0.961s
    Osiris-2 JL: 
    

    Note that gawk is vastly faster than awk on this machine, but still slower than perl. Clearly, your mileage will vary.

提交回复
热议问题