join multiple files

前端 未结 8 1987
春和景丽
春和景丽 2020-12-02 21:11

I am using the standard join command to join two sorted files based on column1. The command is simple join file1 file2 > output_file.

But how do I join 3 or more fil

相关标签:
8条回答
  • 2020-12-02 21:50

    Assuming you have four files A.txt, B.txt, C.txt and D.txt as:

    ~$ cat A.txt
    x1 2
    x2 3
    x4 5
    x5 8
    
    ~$ cat B.txt
    x1 5
    x2 7
    x3 4
    x4 6
    
    ~$ cat C.txt
    x2 1
    x3 1
    x4 1
    x5 1
    
    ~$ cat D.txt
    x1 1
    

    Join the files with:

    firstOutput='0,1.2'; secondOutput='2.2'; myoutput="$firstOutput,$secondOutput"; outputCount=3; join -a 1 -a 2 -e 0 -o "$myoutput" A.txt B.txt > tmp.tmp; for f in C.txt D.txt; do firstOutput="$firstOutput,1.$outputCount"; myoutput="$firstOutput,$secondOutput"; join -a 1 -a 2 -e 0 -o "$myoutput" tmp.tmp $f > tempf; mv tempf tmp.tmp; outputCount=$(($outputCount+1)); done; mv tmp.tmp files_join.txt
    

    Results:

    ~$ cat files_join.txt 
    x1 2 5 0 1
    x2 3 7 1 0
    x3 0 4 1 0
    x4 5 6 1 0
    x5 8 0 1 0
    
    0 讨论(0)
  • 2020-12-02 21:54

    I know this is an old question but for future reference. If you know that the files you want to join have a pattern like in the question here e.g. file1 file2 file3 ... fileN Then you can simply join them with this command

    cat file* > output
    

    Where output will be the series of the joined files which were joined in alphabetical order.

    0 讨论(0)
  • 2020-12-02 21:56

    While a bit an old question, this is how you can do it with a single awk:

    awk -v j=<field_number> '{key=$j; $j=""}  # get key and delete field j
                             (NR==FNR){order[FNR]=key;} # store the key-order
                             {entry[key]=entry[key] OFS $0 } # update key-entry
                             END { for(i=1;i<=FNR;++i) {
                                      key=order[i]; print key entry[key] # print
                                   }
                             }' file1 ... filen
    

    This script assumes:

    • all files have the same amount of lines
    • the order of the output is the same order of the first file.
    • files do not need to be sorted in field <field_number>
    • <field_number> is a valid integer.
    0 讨论(0)
  • 2020-12-02 21:57

    One can join multiple files (N>=2) by constructing a pipeline of joins recursively:

    #!/bin/sh
    
    # multijoin - join multiple files
    
    join_rec() {
        if [ $# -eq 1 ]; then
            join - "$1"
        else
            f=$1; shift
            join - "$f" | join_rec "$@"
        fi
    }
    
    if [ $# -le 2 ]; then
        join "$@"
    else
        f1=$1; f2=$2; shift 2
        join "$f1" "$f2" | join_rec "$@"
    fi
    
    0 讨论(0)
  • 2020-12-02 21:57

    The man page of join states that it only works for two files. So you need to create and intermediate file, which you delete afterwards, i.e.:

    > join file1 file2 > temp
    > join temp file3 > output
    > rm temp
    
    0 讨论(0)
  • 2020-12-02 22:04

    Join joins lines of two files on a common field. If you want to join more - do it in pairs. Join first two files first, then join the result with a third file etc.

    0 讨论(0)
提交回复
热议问题