问题
I have file1 as:
ABC CDEF HAGD CBDGCBAHS:ATSVHC
NBS JHA AUW MNDBE:BWJW
DKW QDW OIW KNDSK:WLKJW
BNSHW JBSS IJS BSHJA
ABC CDEF CBS 234:ATSVHC
DKW QDW FSD 634:WLKJW
and file2:
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
KAB GCBS YSTW SHSEB:AGTW:THE:193
I want to compare file 1 and file 2 based on column 1,2,3 and 4 except that column 4 in file2 has a bit of an extension to compare with, by using
awk 'FNR==NR{seen[$1,$2,$3,$4;next} ($1,$2,$3,$4) in seen' file1 file2
what can I tweak to make it comparable such that my output are the matched lines in file2 as:
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
回答1:
Just include : in the FS:
$ awk -F'[ :]' 'NR==FNR{a[$1,$2,$3,$4,$5];next} ($1,$2,$3,$4,$5) in a' file1 file2
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
回答2:
As I understand it, you want to print lines from file2 that have fields 1, 2, 3, matching the corresponding fields in file1 and also have the beginning part of field 4 in file2 matching field 4 in file1. In that case:
$ awk 'FNR==NR{seen[$1,$2,$3,$4];next} {a=$4; sub(/:[^:]*:[^:]*$/, "", a)} ($1,$2,$3,a) in seen' file1 file2
ABC CDEF HAGD CBDGCBAHS:ATSVHC:THE:123
NBS JHA AUW MNDBE:BWJW:THE:243
DKW QDW OIW KNDSK:WLKJW:THE:253
How it works
FNR==NR{seen[$1,$2,$3,$4];next}While reading the first file, file1, we add ato associative array
seena key which is equal to the first four fields. We then skip the rest of the commands and jump to thenextline.a=$4; sub(/:[^:]*:[^:]*$/, "", a)If we get to here, that means we are working on file2.
This assigns the value of field 4 to variable
aand then removes the last two colon-separated strings froma.($1,$2,$3,a) in seenThis prints any line in file2 for which the first three fields and
aare a key in associative arrayseen.
来源:https://stackoverflow.com/questions/38882681/finding-rows-from-file2-in-file1-which-have-extended-columns-in-file2