How to get the validate the count with the group by data in unix

大憨熊 提交于 2020-01-14 06:04:34

问题


I have a list of records as following

Source:

a,yes
a,yes
b,No
c,N/A
c,N/A
c,N/A
d,xyz
d,abc
d,abc

Output:

a, Yes 2
b, No 1
c, N/A 3
d, xyz 1
d, abc 2

c, N/A "File is not correct"

Here 'Yes' and 'No' are the acceptable words, If any other word count is greater than the 'Yes' or 'No' word count for an individual $1 value then we have issue a statement like "file is not good"

I have tried the below script

awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt

回答1:


If you are not worried about the output sequence(same as Input_file) then following may help you in same.

awk -F, '{array[$1", "$2]++;} /yes/{y++;next} /No/{n++;next} /N\/A/{count++;next} END{;for(i in array){printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")}}'  Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
/N\/A/{
  count++;
  next
}
END{;
  for(i in array){
     printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")
}
}'  Input_file

EDIT2: As per OP N/A shouldn't be hardcoded then following code will check count of string yes, count of string no and count of rest of the second fields. Then it will compare count of rest with yes and no, based on that it will print the lines as per OP's request.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
{
  count[$2]++;
}
END{
  for(i in count){
    val=val>count[i]?val:count[i]
};
  for(i in array){
    printf("%s %s%s\n",i,array[i],(val>y && val>n) &&(i !~ /yes/ && i !~ /No/)?RS i" File is not correct":"")
}
}'   Input_file

After running above code I am getting following.

./script.ksh
d, xyz 1
d, xyz File is not correct
c, N/A 3
c, N/A File is not correct
b, No 1
a, yes 2
d, abc 2
d, abc File is not correct



回答2:


With GNU awk for true multi-dimensional arrays:

$ cat tst.awk
BEGIN { FS=","; OFS=", " }
{ cnt[$1][$2]++ }
END {
    for (key in cnt) {
        for (val in cnt[key]) {
            cur = cnt[key][val]
            print key, val " " cur
            if (tolower(val) ~ /^(yes|no)$/) {
                maxGood = (maxGood > cur ? maxGood : cur)
            }
            else {
                badCnt[key][val] = cur
            }
        }
    }

    print ""
    for (key in badCnt) {
        for (val in badCnt[key]) {
            if (badCnt[key][val] > maxGood) {
                print key, val " File is not correct"
            }
        }
    }
}

$ awk -f tst.awk file
a, yes 2
b, No 1
c, N/A 3
d, abc 2
d, xyz 1

c, N/A File is not correct

Use tolower() in other places or remove it as appropriate if your $2 data really can be upper or lower case or if that's just a mistake in your example and depending on if you want that treated as an error or not.

The output will be in random order courtesy of the in operator - that's easily changed to any other order if you care.




回答3:


#!/bin/sh

FILE=1.txt

for r in `cat $FILE | sort | uniq`; do
count=`grep "$r" "$FILE" | wc -l | sed -e 's/^ *//'`
echo "$r $count";
done


来源:https://stackoverflow.com/questions/45675372/how-to-get-the-validate-the-count-with-the-group-by-data-in-unix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!