问题
I've been trying for a while to solve this problem and I checked many posts (for example here Print lines in one file matching patterns in another file or here awk search for a field in another file) without really finding what I am looking for. I need the solution with bash tools like sed, grep, awk (no python, R,...)
I have two files (much bigger than those):
file1:
   2   891299  0.50923964E-02     1248   4.713       1349.08
   3   245857  0.57915542E-02     1335   4.671       1369.65
file2:
   278    2645  2334659  0.75142      0.53123
   279    2643   245857  0.80439      0.56868
   500    1341   830677  0.74922      0.52958
   501    1339   882791  0.87685      0.61980
   502    1337   891299  0.63224      0.44680
In this example I want to find the pattern in column 2 of file1 in column 3 of file2 and print column 1 of the latter, for all the lines of file1 and maintaining the order given by file1.
A possible solution (I am aware is not bug free) is the following unacceptably slow bash loop:
for i in `awk '{print $2}' file1` ; do grep " $i " file2 | awk '{print $1}' ; done
which prints to screen:
502
279
Please note that a 'simple' solution like:
awk 'NR==FNR{pats[$2]; next} $3 in pats' file1 file2
is not appropriate as the order of the printing is given by file2 and not by file1 (i.e. it prints to screen first 279 and then 502).
Thanks a lot for your help.
回答1:
You can reverse files to be processed in awk and get the right output:
awk 'NR==FNR{pats[$3]=$1; next} $2 in pats{print pats[$2]}' file2 file1
502
279
    来源:https://stackoverflow.com/questions/33389595/find-patterns-of-a-file-in-another-file-and-print-out-a-corresponding-field-of-t