How can I find if a row exists in a file and add a column with the filename using awk?

醉酒当歌 提交于 2020-03-05 04:27:26

问题


I'm trying to find if a row in a file already exists in another file, and, in that case, add a column with the filename.

File1:

 CHROM  POS REF ALT
 chr1   10  T   A
 chr1   12  T   G
 chr1   12  T   C

File2:

 CHROM  POS REF ALT
 chr1   12  T   C
 chr1   13  A   T

I want to check if any row in file2 is in file1.

Expected output:

 CHROM  POS REF ALT
 chr1   10  T   A
 chr1   12  T   G
 chr1   12  T   C   file2

I've tried with this code:

 `awk -F"\t" 'FNR==NR
 {
   seen[$0];next
  }($0 in seen)
 {
   delete seen[$0]
 };
   END{
    for (x in seen);$(NF+1)="file";print
       }
  {print}' OFS="\t" file2  file1`

But this is not working as expected. This is what I'm getting:

 CHROM  POS REF ALT
  chr1  10  T   A
  chr1  12  T   G
  chr1  12  T   C
  chr1  12  T   C   file2

How could I delete the duplicated row? Thanks!


回答1:


Could you please try following.

awk '
FNR==1 && FNR==NR{
  print
  next
}
FNR==NR{
  a[$0]=FILENAME
  next
}
FNR>1{
  print $0,$0 in a?OFS a[$0]:""
}'  file2  file1

Output will be as follows.

CHROM  POS REF ALT
chr1   10  T   A 
chr1   12  T   G 
chr1   12  T   C  file2

NOTE: In case Input_files are TAB delimited and we need output in TAB delimited form too then add a BEGIN section after awk like awk 'BEGIN{FS=OFS="\t"}....



来源:https://stackoverflow.com/questions/57753300/how-can-i-find-if-a-row-exists-in-a-file-and-add-a-column-with-the-filename-usin

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!