Awk - Count Each Unique Value and Match Values Between Two Files

我只是一个虾纸丫 提交于 2021-02-05 12:19:45

问题


I have two files. First I am trying to get the count of each unique field in column 4.

And then match the unique field value from the 2nd column of the 2nd file.

File1 - column 4's each unique value and File2 - columns 2 contains the value that I need to match between the two files

So essentially, I am trying to -> take each unique value and value count from column 4 from File1, if there is a match in column2 of file2

File1

1 2 3 6 5 

2 3 4 5 1 

3 5 7 6 1

2 3 4 6 2

4 6 6 5 1

File2

hello "6"

hi "5"

needed output

total count of hello,6 : 3

total count of hi,5 : 2

my test code

awk 'NR==FNR{a[$4]++}NR!=FNR{gsub(/"/,"",$2);b[$2]=$0}END{for( i in b){printf "Total count of %s,%d : %d\n",gensub(/^([^ ]+).*/,"\1","1",b[i]),i,a[i]}}' File1 File2

I believe I should be able to do this with awk, but for some reason I am really struggling with this one.

Thanks


回答1:


Yes, this can be done - here a somewhat verbose awk version (using GNU awk and its non-POSIX compliant extension gensub):

tink@box ~/tmp$ awk 'NR==FNR{a[$4]++}NR!=FNR{gsub(/"/,"",$2);b[$2]=$0}END{for( i in b){printf "Total count of %s,%d : %d\n",gensub(/^([^ ]+).*/,"\\1","1",b[i]),i,a[i]}}' File1 File2
Total count of hi,5 : 2
Total count of hello,6 : 3

A few explanatory words:

NR == FNR {  # while we're on the first file, count all values in column 4
        a[$4]++
}
NR != FNR { # on the second file, strip the quotes from field two, use 2 as
            # index of the array for the second file
        gsub(/"/, "", $2)
        b[$2] = $0
}
# END rule(s)
END { # after both files were processed, pull a match for every line in the 
      # second array, and print it with the count of the occurrences in File1
        for (i in b) {
                printf "Total count of %s,%d : %d\n", gensub(/^([^ ]+).*/, "\\1", "1", b[i]), i, a[i]
        }
}



回答2:


$ cat tst.awk
BEGIN { FS = "[[:space:]\"]+" }
NR==FNR {
    cnt[$4]++
    next
}
{ printf "total count of %s,%d : %d\n", $1, $2, cnt[$2] }

$ awk -f tst.awk file1 file2
total count of hello,6 : 3
total count of hi,5 : 2


来源:https://stackoverflow.com/questions/65469676/awk-count-each-unique-value-and-match-values-between-two-files

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!