Categorize a column and count the number of warnings in a column

孤街浪徒 提交于 2021-02-09 11:13:13

问题


I am having a file called out.txt as below:

Statement 1                        Statement 2  Statement 3    Statement 4
The declaration is not done         /   Exp     /   *       /  This is expected
The declaration is starting/started /   St      /   *       /  This is not expected
The declaration is not yet designed /   Yt      /   &       /  This is a major one
The declaration is confirmed        /   Exp     /   *       /  This is okay
The declaration is not confirmed    /   Ntp     /   &       /  This is a major issue

I need to sum up and categorize from column 3 (Statement 3), if it is * as Warning and if it is & it is a Error as below:

Out:
Warnings:
    Exp : 2
    St  : 1
Total : 3
Errors:
    Yt : 1
    Ntp: 1
Total :2

I tried below code, but not getting the exact output:

#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / ' 
     { a[$2]++ }
     END{ for(j in a){ print j, a[j]; s=s+a[j] };
 print "Total :", s}' out.txt
else 
echo "$z:";
done

回答1:


EDIT2: Since OP confirmed that there are NO keywords for errors it should be decided by & keyword in 2nd last field of line then try following.

awk -F'/' '
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
  val=substr($0,RSTART,RLENGTH)
  gsub(/[[:space:]]+|\//,"",val)
  str=$(NF-1)
  gsub(/ +/,"",str)
  if(str=="&"){
     countEr[val]++
  }
  else{
     countSu[val]++
  }
  val=str=""
}
END{
  print "Out:" ORS "Warings:"
  for(i in countSu){
     print "\t"i,countSu[i]
     sumSu+=countSu[i]
  }
  print "Total:"sumSu
  print "Errors:"
  for(i in countEr){
     print "\t"i,countEr[i]
     sumEr+=countEr[i]
  }
  print "Total:"sumEr
}' Input_file


EDIT: Generic solution where one could give all errors names in a variable and then we need NOT to put it all conditions manually like my previous solution does. Could you please try following, based on your shown samples only written and tested with GNU awk.

awk -v errors="Ntp,Yt"  '
BEGIN{
  num=split(errors,arr,",")
  for(i=1;i<=num;i++){
     errorVal[arr[i]]
  }
}
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
  val=substr($0,RSTART,RLENGTH)
  gsub(/[[:space:]]+|\//,"",val)
  if(val in errorVal){
     countEr[val]++
  }
  else{
     countSu[val]++
  }
  val=""
}
END{
  print "Out:" ORS "Warings:"
  for(i in countSu){
     print "\t"i,countSu[i]
     sumSu+=countSu[i]
  }
  print "Total:"sumSu
  print "Errors:"
  for(i in countEr){
     print "\t"i,countEr[i]
     sumEr+=countEr[i]
  }
  print "Total:"sumEr
}'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                                 ##Starting awk program from here.
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){        ##Using match function to match space slash space and slash here as per samples to get value.
  val=substr($0,RSTART,RLENGTH)                       ##Saving sub-string into variable val from RSTART to RLENGTH here.
  gsub(/[[:space:]]+|\//,"",val)                      ##Removing spaces and slashes with NULL in val here.
  if(val=="Ntp" || val=="Yt"){                        ##Checking condition if value is either Ntp PR Yt then do following.
     countEr[val]++                                   ##Increase count for array countEr with 1 with index of val here.
  }
  else{                                               ##Else do following.
     countSu[val]++                                   ##Increase count of array countSu with index of val here.
  }
  val=""                                              ##Nullifying val here.
}
END{                                                  ##Starting END block of this program here.
  print "Out:" ORS "Warnings:"                        ##Printing string Out new line and Warnings here.
  for(i in countSu){                                  ##Traversing through countSu here.
     print "\t"i,countSu[i]                           ##Printing tab index of array and value of CountSu here.
     sumSu+=countSu[i]                                ##Keep on adding value of countSu current item into sumSu variable here.
  }
  print "Total:"sumSu                                 ##Printing Total string with sumSu value here.
  print "Errors:"                                     ##Printing string Errors here.
  for(i in countEr){                                  ##Traversing through countEr here.
     print "\t"i,countEr[i]                           ##Printing tab index i and countEr value here.
     sumEr+=countEr[i]                                ##Keep on adding value of countEr current item into sumEr variable here.
  }
  print "Total:"sumEr                                 ##Printing Total string with sumEr value here.
}'  Input_file                                        ##Mentioning Input_file name here.



回答2:


another gawk alternative - relies on gawk's "true multi-dimensional arrays": $ cat tst.awk:

BEGIN {
  FS="[[:blank:]]/[[:blank:]]"
  OFS=" : "
}
FNR>1{
   gsub(/[[:blank:]]/, "", $2)
   gsub(/[[:blank:]]/, "", $3)
   a[$3][$2]++
}
END {
  #PROCINFO["sorted_in"]="@ind_str_desc"
  print "Out" OFS
  for(i in a) {
    print (i=="*"?"Warnings":"Errors") OFS
    t=0
    for(j in a[i]) {
      print "\t" j, a[i][j]
      t+=a[i][j]
    }
    print "Total", t
    t=0
  }
}

gawk -tst.awk myFile results in:

Out :
Warnings :
        St : 1
        Exp : 2
Total : 3
Errors :
        Ntp : 1
        Yt : 1
Total : 2


来源:https://stackoverflow.com/questions/63670440/categorize-a-column-and-count-the-number-of-warnings-in-a-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!