properly join two files based on 2 columns in common

后端 未结 3 1493
栀梦
栀梦 2020-11-30 01:28

I have two files I\'m trying to join/merge based on columns 1 and 2. They look something like this, with file1 (58210 lin

3条回答
  •  迷失自我
    2020-11-30 02:01

    awk 'NR==FNR{a[$1,$2]=$3;next} ($1,$2) in a{print $0, a[$1,$2]}' file1 file2
    

    Look:

    $ cat file1
    2L      5753   33158
    2L      8813   33158
    2L      7885   33159
    2L      1279   33159
    2L      5095   33158
    $
    $ cat file2
    2L      8813    0.6    1.2
    2L      5762    0.4    0.5
    2L      1279    0.5    0.9
    $
    $ awk 'NR==FNR{a[$1,$2]=$3;next} ($1,$2) in a{print $0, a[$1,$2]}' file1 file2
    2L      8813    0.6    1.2 33158
    2L      1279    0.5    0.9 33159
    $
    

    If that's not what you want, please clarify and perhaps post some more representative sample input/output.

    Commented version of the above code to provide requested explanation:

    awk ' # START SCRIPT
    
    # IF the number of records read so far across all files is equal
    #    to the number of records read so far in the current file, a
    #    condition which can only be true for the first file read, THEN 
    NR==FNR {
    
       # populate array "a" such that the value indexed by the first
       # 2 fields from this record in file1 is the value of the third
       # field from the first file.
       a[$1,$2]=$3
    
       # Move on to the next record so we don't do any processing intended
       # for records from the second file. This is like an "else" for the
       # NR==FNR condition.
       next
    
    } # END THEN
    
    # We only reach this part of the code if the above condition is false,
    # i.e. if the current record is from file2, not from file1.
    
    # IF the array index constructed from the first 2 fields of the current
    #    record exist in array a, as would occur if these same values existed
    #    in file1, THEN
    ($1,$2) in a {
    
       # print the current record from file2 followed by the value from file1
       # that occurred at field 3 of the record that had the same values for
       # field 1 and field 2 in file1 as the current record from file2.
       print $0, a[$1,$2]
    
    } # END THEN
    
    ' file1 file2 # END SCRIPT
    

    Hope that helps.

提交回复
热议问题