finding rows from file2 in file1 which have extended columns in file2

时光怂恿深爱的人放手 提交于 2019-12-11 23:17:49

问题


I have file1 as:

ABC CDEF HAGD CBDGCBAHS:ATSVHC
NBS JHA AUW MNDBE:BWJW
DKW QDW OIW KNDSK:WLKJW
BNSHW JBSS IJS BSHJA
ABC CDEF CBS 234:ATSVHC
DKW QDW FSD 634:WLKJW

and file2:

ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
KAB GCBS YSTW SHSEB:AGTW:THE:193

I want to compare file 1 and file 2 based on column 1,2,3 and 4 except that column 4 in file2 has a bit of an extension to compare with, by using

awk 'FNR==NR{seen[$1,$2,$3,$4;next} ($1,$2,$3,$4) in seen' file1 file2

what can I tweak to make it comparable such that my output are the matched lines in file2 as:

ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253

回答1:


Just include : in the FS:

$ awk -F'[ :]' 'NR==FNR{a[$1,$2,$3,$4,$5];next} ($1,$2,$3,$4,$5) in a' file1 file2
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253



回答2:


As I understand it, you want to print lines from file2 that have fields 1, 2, 3, matching the corresponding fields in file1 and also have the beginning part of field 4 in file2 matching field 4 in file1. In that case:

$ awk 'FNR==NR{seen[$1,$2,$3,$4];next} {a=$4; sub(/:[^:]*:[^:]*$/, "", a)} ($1,$2,$3,a) in seen' file1 file2
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253

How it works

  • FNR==NR{seen[$1,$2,$3,$4];next}

    While reading the first file, file1, we add ato associative array seen a key which is equal to the first four fields. We then skip the rest of the commands and jump to the next line.

  • a=$4; sub(/:[^:]*:[^:]*$/, "", a)

    If we get to here, that means we are working on file2.

    This assigns the value of field 4 to variable a and then removes the last two colon-separated strings from a.

  • ($1,$2,$3,a) in seen

    This prints any line in file2 for which the first three fields and a are a key in associative array seen.



来源:https://stackoverflow.com/questions/38882681/finding-rows-from-file2-in-file1-which-have-extended-columns-in-file2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!