问题
I am having a file called out.txt as below:
Statement 1 Statement 2 Statement 3 Statement 4
The declaration is not done / Exp / * / This is expected
The declaration is starting/started / St / * / This is not expected
The declaration is not yet designed / Yt / & / This is a major one
The declaration is confirmed / Exp / * / This is okay
The declaration is not confirmed / Ntp / & / This is a major issue
I need to sum up and categorize from column 3 (Statement 3), if it is * as Warning and if it is & it is a Error as below:
Out:
Warnings:
Exp : 2
St : 1
Total : 3
Errors:
Yt : 1
Ntp: 1
Total :2
I tried below code, but not getting the exact output:
#!/bin/bash
echo " " ;
File="out.txt"
for z in out.txt;
do
if grep -q "&" $z/"$File"; then
echo "$z:";
awk -F' / '
{ a[$2]++ }
END{ for(j in a){ print j, a[j]; s=s+a[j] };
print "Total :", s}' out.txt
else
echo "$z:";
done
回答1:
EDIT2: Since OP confirmed that there are NO keywords for errors it should be decided by & keyword in 2nd last field of line then try following.
awk -F'/' '
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|\//,"",val)
str=$(NF-1)
gsub(/ +/,"",str)
if(str=="&"){
countEr[val]++
}
else{
countSu[val]++
}
val=str=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "\t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "\t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
EDIT: Generic solution where one could give all errors names in a variable and then we need NOT to put it all conditions manually like my previous solution does. Could you please try following, based on your shown samples only written and tested with GNU awk.
awk -v errors="Ntp,Yt" '
BEGIN{
num=split(errors,arr,",")
for(i=1;i<=num;i++){
errorVal[arr[i]]
}
}
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){
val=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]+|\//,"",val)
if(val in errorVal){
countEr[val]++
}
else{
countSu[val]++
}
val=""
}
END{
print "Out:" ORS "Warings:"
for(i in countSu){
print "\t"i,countSu[i]
sumSu+=countSu[i]
}
print "Total:"sumSu
print "Errors:"
for(i in countEr){
print "\t"i,countEr[i]
sumEr+=countEr[i]
}
print "Total:"sumEr
}' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
match($0,/[[:space:]]+\/[^/]*[[:space:]]+\//){ ##Using match function to match space slash space and slash here as per samples to get value.
val=substr($0,RSTART,RLENGTH) ##Saving sub-string into variable val from RSTART to RLENGTH here.
gsub(/[[:space:]]+|\//,"",val) ##Removing spaces and slashes with NULL in val here.
if(val=="Ntp" || val=="Yt"){ ##Checking condition if value is either Ntp PR Yt then do following.
countEr[val]++ ##Increase count for array countEr with 1 with index of val here.
}
else{ ##Else do following.
countSu[val]++ ##Increase count of array countSu with index of val here.
}
val="" ##Nullifying val here.
}
END{ ##Starting END block of this program here.
print "Out:" ORS "Warnings:" ##Printing string Out new line and Warnings here.
for(i in countSu){ ##Traversing through countSu here.
print "\t"i,countSu[i] ##Printing tab index of array and value of CountSu here.
sumSu+=countSu[i] ##Keep on adding value of countSu current item into sumSu variable here.
}
print "Total:"sumSu ##Printing Total string with sumSu value here.
print "Errors:" ##Printing string Errors here.
for(i in countEr){ ##Traversing through countEr here.
print "\t"i,countEr[i] ##Printing tab index i and countEr value here.
sumEr+=countEr[i] ##Keep on adding value of countEr current item into sumEr variable here.
}
print "Total:"sumEr ##Printing Total string with sumEr value here.
}' Input_file ##Mentioning Input_file name here.
回答2:
another gawk alternative - relies on gawk's "true multi-dimensional arrays":
$ cat tst.awk:
BEGIN {
FS="[[:blank:]]/[[:blank:]]"
OFS=" : "
}
FNR>1{
gsub(/[[:blank:]]/, "", $2)
gsub(/[[:blank:]]/, "", $3)
a[$3][$2]++
}
END {
#PROCINFO["sorted_in"]="@ind_str_desc"
print "Out" OFS
for(i in a) {
print (i=="*"?"Warnings":"Errors") OFS
t=0
for(j in a[i]) {
print "\t" j, a[i][j]
t+=a[i][j]
}
print "Total", t
t=0
}
}
gawk -tst.awk myFile results in:
Out :
Warnings :
St : 1
Exp : 2
Total : 3
Errors :
Ntp : 1
Yt : 1
Total : 2
来源:https://stackoverflow.com/questions/63670440/categorize-a-column-and-count-the-number-of-warnings-in-a-column