How to get the validate the count with the group by data in unix

问题

I have a list of records as following

Source:

a,yes
a,yes
b,No
c,N/A
c,N/A
c,N/A
d,xyz
d,abc
d,abc

Output:

a, Yes 2
b, No 1
c, N/A 3
d, xyz 1
d, abc 2

c, N/A "File is not correct"

Here 'Yes' and 'No' are the acceptable words, If any other word count is greater than the 'Yes' or 'No' word count for an individual $1 value then we have issue a statement like "file is not good"

I have tried the below script

awk -F, '{a[$1]++;}END{for (i in a)print i, a[i];}' filetest.txt

回答1:

If you are not worried about the output sequence(same as Input_file) then following may help you in same.

awk -F, '{array[$1", "$2]++;} /yes/{y++;next} /No/{n++;next} /N\/A/{count++;next} END{;for(i in array){printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")}}'  Input_file

EDIT: Adding a non-one liner form of solution too now.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
/N\/A/{
  count++;
  next
}
END{;
  for(i in array){
     printf("%s %s%s\n",i,array[i],(count>y && count>n) && i ~ /N\/A/?RS i" File is not correct":"")
}
}'  Input_file

EDIT2: As per OP N/A shouldn't be hardcoded then following code will check count of string yes, count of string no and count of rest of the second fields. Then it will compare count of rest with yes and no, based on that it will print the lines as per OP's request.

awk -F, '{
array[$1", "$2]++;
}
/yes/{
  y++;
  next
}
/No/{
  n++;
  next
}
{
  count[$2]++;
}
END{
  for(i in count){
    val=val>count[i]?val:count[i]
};
  for(i in array){
    printf("%s %s%s\n",i,array[i],(val>y && val>n) &&(i !~ /yes/ && i !~ /No/)?RS i" File is not correct":"")
}
}'   Input_file

After running above code I am getting following.

./script.ksh
d, xyz 1
d, xyz File is not correct
c, N/A 3
c, N/A File is not correct
b, No 1
a, yes 2
d, abc 2
d, abc File is not correct

回答2:

With GNU awk for true multi-dimensional arrays:

$ cat tst.awk
BEGIN { FS=","; OFS=", " }
{ cnt[$1][$2]++ }
END {
    for (key in cnt) {
        for (val in cnt[key]) {
            cur = cnt[key][val]
            print key, val " " cur
            if (tolower(val) ~ /^(yes|no)$/) {
                maxGood = (maxGood > cur ? maxGood : cur)
            }
            else {
                badCnt[key][val] = cur
            }
        }
    }

    print ""
    for (key in badCnt) {
        for (val in badCnt[key]) {
            if (badCnt[key][val] > maxGood) {
                print key, val " File is not correct"
            }
        }
    }
}

$ awk -f tst.awk file
a, yes 2
b, No 1
c, N/A 3
d, abc 2
d, xyz 1

c, N/A File is not correct

Use tolower() in other places or remove it as appropriate if your $2 data really can be upper or lower case or if that's just a mistake in your example and depending on if you want that treated as an error or not.

The output will be in random order courtesy of the in operator - that's easily changed to any other order if you care.

回答3:

#!/bin/sh

FILE=1.txt

for r in `cat $FILE | sort | uniq`; do
count=`grep "$r" "$FILE" | wc -l | sed -e 's/^ *//'`
echo "$r $count";
done

来源：https://stackoverflow.com/questions/45675372/how-to-get-the-validate-the-count-with-the-group-by-data-in-unix

标签

Linux

bash

shell

unix

awk