Print lines containing the same second field for more than 3 times in a text file

梦想的初衷 提交于 2021-02-11 12:33:08

问题


Here is what I am doing.

The text file is comma separated and has three field, and I want to extract all the line containing the same second field more than three times.

Text file (filename is "text"):

11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
6,keyword2,content5
6,keyword2,content5
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2

My command is like below. cat the whole text file inside awk and grep with the second field of each line and count the number of the line. If the number of the line is greater than 2, print the whole line.

The command:

awk -F "," '{ "cat text | grep "$2 " | wc -l" | getline var; if ( 2 < var ) print $0}' text

However, the command output contains only first three consecutive lines, instead of printing also last three lines containing "keyword1" which occurs in the text for six times.

Result:

11,keyword1,content1
4,keyword1,content3
5,keyword1,content2

My expected result:

11,keyword1,content1
4,keyword1,content3
5,keyword1,content2
7,keyword1,content4
8,keyword1,content2
1,keyword1,content2

Can somebody tell me what I am doing wrong?


回答1:


It is relatively straight-forward to make just two passes over the file. In the first pass, you count the number of occurrences of each value in column 2. In the second pass, you print out the rows where the value in column 2 occurs more than your threshold value of 3 times.

awk -F, 'FNR == NR { count[$2]++ }
         FNR != NR { if (count[$2] > 3) print }' text text

The first line of code handles the first pass; it counts the occurrences of each different value of the second column.

The second line of code handles the second pass; if the value in column 2 was counted more than 3 times, print the whole line.

This doesn't work if the input is only available on a pipe rather than as a file (so you can't make two passes over the data). Then you have to work much harder.



来源:https://stackoverflow.com/questions/31622675/print-lines-containing-the-same-second-field-for-more-than-3-times-in-a-text-fil

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!