问题
I have 2 tab separated files with 2 columns. column1 1 is number and column 2 is ID. like these 2 examples:
example file1:
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
example file1:
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
I want to find the common rows of 2 files based on column 2 (column ID) and make a new tab separated file in which there are 4 columns: column1 is ID (common ID) column2 is the number from file1, column3 is the number from file2 and column4 is the log2 values of ratio of columns 2 and 3 (which means log2(column2/column3)). for example regarding the ID "TPT1": 1st column is TPT1, column2 is 188, column3 is 2217 and column 4 is log2(188/2217) which is equal to -3.561494. here is a the expected output:
expected output:
TPT1 188 2217 -3.561494
ATP5C1 420 223 0.9133394
I am trying to do that in AWK using the following code:
awk 'NR==FNR { n[$2]=$0;next } ($2 in n) { print n[$2 '\t' $1] '\t' $1 '\t' log(n[$1]/$1)}' file1.txt file2.txt > result.txt
this code does not return what I expect. do you know how to fix it?
回答1:
$ awk -v OFS="\t" 'NR==FNR {n[$2]=$1;next} ($2 in n) {print $2, $1, n[$2], log(n[$2]/$1)/log(2)}' file1 file2
TPT1 2217 188 -3.5598
ATP5C1 223 420 0.913346
回答2:
I'd use join to actually merge the files instead of awk:
$ join -j2 <(sort -k2 file1.txt) <(sort -k2 file2.txt) |
awk -v OFS="\t" '{ print $1, $2, $3, log($2/$3)/log(2) }'
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
The join program, well, joins two files on a common value. It does require the files to be sorted based on the join column, but your examples aren't, hence the inline sorting of the data files. Its output is then piped to awk to compute the log2 of the numbers of each line and produce tab-delimited results.
Alternative using perl which gives you more default precision if you care about that (And don't want to mess with awk's CONVFMT variable):
$ join -j2 <(sort -k2 a.txt) <(sort -k2 b.txt) |
perl -lane 'print join("\t", @F, log($F[1]/$F[2])/log(2))'
ATP5C1 420 223 0.913345617745818
TPT1 188 2217 -3.55980420318967
回答3:
awk + sort approach
awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
with the given inputs
$ cat ellyx.txt
188 TPT1
133 ACTR2
420 ATP5C1
942 DNAJA1
$ cat ellyy.txt
91 PSMD7
2217 TPT1
223 ATP5C1
156 TCP1
$ awk ' { print $0,FILENAME }' ellyx.txt ellyy.txt | sort -k2 -k3 | awk ' {c=$2;if(c==p) { print c,a,$1,log(a/$1)/log(2) }p=c;a=$1 } '
ATP5C1 420 223 0.913346
TPT1 188 2217 -3.5598
$
来源:https://stackoverflow.com/questions/54552140/combining-and-processing-2-tab-separated-files-in-awk-and-make-a-new-one