Combine multiple text files using awk

问题

I have some minutely stats saved in text files and named as 1min.txt, 2min.txt etc.

1min.txt
F1,21
F2,32
F3,22

2min.txt
F2,12
F4,32

I would like to combine these files in the following format:

combined.txt
Field   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

Some fields may not exist in some files and 0 will be entered for those fields.

I've tried to do it using awk but couldn't find an easy way. Can someone please help?

Thanks

回答1:

Using awk:

awk -F, '
!seen[FILENAME]++ { 
    fname[++numFile] = FILENAME 
}
{
    flds[$1]++;
    map[FILENAME,$1] = $2
}
END {
    printf "%-10s", "FIELD";
    for (cnt=1; cnt<=numFile; cnt++) {
        file = fname[cnt];
        sub (/.txt/, "", file);
        printf "%-10s", file; 
    }
    print ""; 
    for (fld in flds) { 
        printf "%-10s", fld; 
        for (cnt=1; cnt<=numFile; cnt++) {
            printf "%-10s", map[fname[cnt],fld]+0
        }
        print "";
    }
}' 1min.txt 2min.txt

Output:

FIELD      1min       2min      
F1         21         0         
F2         32         12        
F3         22         0         
F4         0          32

Once you have reviewed the output, you can re-direct the output to another file. You can pass as many files at the end as you want. If you have way too many then you can even use shell glob, for eg: *.txt

Note: I haven't guaranteed the order of fields since they are not always present in all files.

Here is a pure fun perl japh that will do the same:

perl -F, -lane'
$f{$ARGV}++; $h{$F[0]}
{$ARGV}=      $F[  1  ]
}{print       join"\t",
"FIELD",      map{s/.[tx]+
//x           ;$_}sort{$a
<=>$b}        keys%f;print
join"\n",    map{$f
=$_;         join
"\t",  $f,map
{$h{$f
}{$_}
//=0}
sort{$a
<=>$b}
keys%f
}sort
keys%h;
' *.txt

Output:

FIELD   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

回答2:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
{ keys[$1]; val[$1,NR==FNR] = $2 }
END {
    print "Field", "1min", "2min"
    for (key in keys) {
        print key, val[key,1]+0, val[key,0]+0
    }
}

$ awk -f tst.awk 1min.txt 2min.txt
Field   1min    2min
F1      21      0
F2      32      12
F3      22      0
F4      0       32

If you care about the output order, tell us what order you're looking for - the order the keys were seen across both files or alphabetical or something else. If it's the order they are seen then that'd be:

$ cat tst.awk
BEGIN { FS=","; OFS="\t" }
!seen[$1]++ { keys[++numKeys] = $1 }
{ val[$1,NR==FNR] = $2 }
END {
    print "Field", "1min", "2min"
    for (k=1; k<=numKeys; k++) {
        key = keys[k]
        print key, val[key,1]+0, val[key,0]+0
    }
}

回答3:

Using join:

join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | column -t -s,

Gives:

To add a header:

join -t , input1 input2 -j 1 -o "0 1.2 2.2" -e 0 -a1 -a2 | \
     sed '1iField,1min,2min' | column -t -s,

And the result looks like:

Field  1min  2min
F1     21    0
F2     32    12
F3     22    0
F4     0     32

回答4:

Awk allows you to explicitly read from files, so you can just put all the logic in a BEGIN section if you want. Here is an example:

awk -F, '
  BEGIN {
    while (getline <"1min.txt") {
      field[$1]=1
      a1[$1]=$2
    }
    while (getline <"2min.txt") {
      field[$1]=1
      a2[$1]=$2
    }
    print "Field\t1min\t2min"
    for (x in field) {
      print x"\t"(a1[x]+0)"\t"(a2[x]+0)
    }
  }
'

回答5:

I have written some python code to solve you problem.

fh_1 = open("1min.txt", "r")
fh_2 = open("2min.txt", "r")
fh_3 = open("combine.txt", "w")

min_c_1 = {}
min_c_2 = {}

lines_of_text = ["Field   1min    2min\n"]

for l1 in fh_1.readlines():
    data = l1.split(',')
    min_c_1[data[0]] = data[1].rstrip()

for l1 in fh_2.readlines():
    data = l1.split(',')
    min_c_2[data[0]] = data[1].rstrip()

for key in min_c_1.keys():
    if key in min_c_2.keys():
        msg = str(key) + "      " + str(min_c_1[key]) + "      " +  str(min_c_2[key]) + "\n"
        lines_of_text.append(msg)
        del min_c_2[key]
    else:
        msg = str(key) + "      " + str(min_c_1[key]) + "      0" + "\n"
        lines_of_text.append(msg)

for key in min_c_2.keys():
    msg = str(key) + "      0" + "       " + str(min_c_2[key]) + "\n"
    lines_of_text.append(msg)

fh_3.writelines(lines_of_text)

fh_1.close()
fh_2.close()
fh_3.close()

Please let me know if it does not help.

来源：https://stackoverflow.com/questions/25299081/combine-multiple-text-files-using-awk

标签

awk