Using awk how do I print all lines containing duplicates of specific columns?

微笑、不失礼 提交于 2020-01-17 06:56:34

问题


Input:

a;3;c;1
a;4;b;2
a;5;c;1

Output:

a;3;c;1
a;5;c;1

Hence, all lines which have duplicates of columns 1,3 and 4 should be printed.


回答1:


If a 2-pass approach is OK:

$ awk -F';' '{key=$1 FS $3 FS $4} NR==FNR{cnt[key]++;next} cnt[key]>1' file file
a;3;c;1
a;5;c;1

otherwise:

$ awk -F';' '
    { key=$1 FS $3 FS $4; a[key,++cnt[key]]=$0 }
    END {
        for (key in cnt)
            if (cnt[key] > 1)
                for (i=1; i<=cnt[key]; i++)
                    print a[key,i]
    }
' file
a;3;c;1
a;5;c;1

The output order of keys in that second script will be random due to the in operator - easily fixed if that's an issue.




回答2:


give this one-liner a try:

awk -F';' '{k=$1 FS $3 FS $4}
    NR==FNR{if(a[k]){p[a[k]];p[NR]}a[k]=NR;next}FNR in p' file file

It goes through the file twice, first time, it marked the line numbers should be printed, second time print those lines.




回答3:


Here is my solution:

awk 'BEGIN{ FS=";" }NR==1{ split($0, a, ";"); print }NR>1{ if ( a[1] == $1 && a[3] == $3 && a[4] == $4){ print }}'

Output:

a;3;c;1
a;5;c;1

Works of course only if the line with specific column is the first one.



来源:https://stackoverflow.com/questions/43675831/using-awk-how-do-i-print-all-lines-containing-duplicates-of-specific-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!