Join two files using awk

前端 未结 2 475
南笙
南笙 2020-12-17 02:24

I have two files like shown below which are tab-delimited:

file A

chr1   123 aa b c d
chr1   234 a  b c d
chr1   345 aa b c d
chr1   456 a  b c d
...         


        
2条回答
  •  心在旅途
    2020-12-17 02:33

    You can use join, but the pipeline gets so complicated it might be easier to switch to a more powerful language like Perl.

    join -11 -21 -o1.1,1.2,1.3,1.4,1.5,2.4,2.5 \
         <(sed 's/ \+/:/' fileA | sort) \
         <(sed 's/ \+/:/' fileB | sort) \
     | join -11 -22 -a1 -o1.1,1.2,1.3,1.4,1.5,1.6,1.7,2.5,2.6 \
         - <(sed 's/ \+\([^ ]\+\) \+\([^ ]\+\)/ \1:\2/' fileC | sort -k2) \
     | sed 's/:/ /'
    

    Perl solution, using a hash to remember all the information:

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    #             key_start  key_end  keep_from  output
    my %files = (A => [0,      1,      2,       [0 .. 3]],
                 B => [0,      1,      2,       [-2, -1]],
                 C => [1,      2,      3,       [-2, -1]],
                );
    
    my %hash;
    
    for my $file (keys %files) {
        open my $FH, '<', "file$file" or die "file$file: $!";
        while (<$FH>) {
            my @fields = split;
            $hash{"@fields[$files{$file}[0], $files{$file}[1]]"}{$file}
                = [ @fields[$files{$file}[2] .. $#fields] ];
        }
    }
    
    for my $key (sort keys %hash) {
        print $key, join(' ', q(),
                         grep defined, map {
                             @{ $hash{$key}{$_} }[@{ $files{$_}[-1] }]
                         } sort keys %files), "\n";
    }
    

提交回复
热议问题