问题
I'm trying to find if a row in a file already exists in another file, and, in that case, add a column with the filename.
File1:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C
File2:
CHROM POS REF ALT
chr1 12 T C
chr1 13 A T
I want to check if any row in file2 is in file1.
Expected output:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C file2
I've tried with this code:
`awk -F"\t" 'FNR==NR
{
seen[$0];next
}($0 in seen)
{
delete seen[$0]
};
END{
for (x in seen);$(NF+1)="file";print
}
{print}' OFS="\t" file2 file1`
But this is not working as expected. This is what I'm getting:
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C
chr1 12 T C file2
How could I delete the duplicated row? Thanks!
回答1:
Could you please try following.
awk '
FNR==1 && FNR==NR{
print
next
}
FNR==NR{
a[$0]=FILENAME
next
}
FNR>1{
print $0,$0 in a?OFS a[$0]:""
}' file2 file1
Output will be as follows.
CHROM POS REF ALT
chr1 10 T A
chr1 12 T G
chr1 12 T C file2
NOTE: In case Input_files are TAB delimited and we need output in TAB delimited form too then add a BEGIN section after awk like awk 'BEGIN{FS=OFS="\t"}....
来源:https://stackoverflow.com/questions/57753300/how-can-i-find-if-a-row-exists-in-a-file-and-add-a-column-with-the-filename-usin