Bash script that analyzes report files

问题

I have the following bash script which I will use to analyze all report files in the current directory:

#!/bin/bash    


# methods
analyzeStructuralErrors()
{ 
    # do something with $1
}

# main
reportFiles=`find $PWD -name "*_report*.txt"`; 
for f in $reportFiles
do
    echo "Processing $f"
    analyzeStructuralErrors $f
done

My report files are formatted as such:

Error Code for Issue X - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
   1143-1-1411-247-1-72953-1
   1143-2-1411-247-436-72953-1
   2211-1-1888-204-442-22222-1
Error Code for Issue Y - Description Text - Number of errors.
col1_name,col2_name,col3_name,col4_name,col5_name,col6_name
   Other data
   .
   .
   .

I'm looking for a way to go through each file and aggregate the report data. In the above example, we have two unique issues of type X, which I would like to handle in analyzeStructural. Other types of issues can be ignored in this routine. Can anyone offer advice on how to do this? I want to read each line until I hit the next error basically, and put that data into some kind of data structure.

回答1:

As suggested by Dave Jarvis, awk will:

handle this better than bash
is fairly easy to learn
likely available wherever bash is available

I've never had to look farther than The AWK Manual.

It would make things easier if you used a consistent field separator for both the list of column names and the data. Perhaps you could do some pre-processing in a bash script using sed before feeding to awk. Anyway, take a look at multi-dimensional arrays and reading multiple lines in the manual.

回答2:

Below is a working awk implementation that uses it's pseudo multidimensional arrays. I've included sample output to show you how it looks. I took the liberty to add a 'Count' column to denote how many times a certain "Issue" was hit for a given Error Code

#!/bin/bash

awk '
 /Error Code for Issue/ {
   errCode[currCode=$5]=$5
 }
 /^ +[0-9-]+$/ {
   split($0, tmpArr, "-")
   error[errCode[currCode],tmpArr[1]]++
 }
 END {
   for (code in errCode) {
     printf("Error Code: %s\n", code)
     for (item in error) {
       split(item, subscr, SUBSEP)
       if (subscr[1] == code) {
         printf("\tIssue: %s\tCount: %s\n", subscr[2], error[item])
       }
     }
   }
 }
' *_report*.txt

Output

$ ./report.awk
Error Code: B
        Issue:    1212  Count: 3
Error Code: X
        Issue:    2211  Count: 1
        Issue:    1143  Count: 2
Error Code: Y
        Issue:    2961  Count: 1
        Issue:    6666  Count: 1
        Issue:    5555  Count: 2
        Issue:    5911  Count: 1
        Issue:    4949  Count: 1
Error Code: Z
        Issue:    2222  Count: 1
        Issue:    1111  Count: 1
        Issue:    2323  Count: 2
        Issue:    3333  Count: 1
        Issue:    1212  Count: 1

回答3:

Bash has one-dimensional arrays that are indexed by integers. Bash 4 adds associative arrays. That's it for data structures. AWK has one dimensional associative arrays and fakes its way through two dimensional arrays. If you need some kind of data structure more advanced than that, you'll need to use Python, for example, or some other language.

That said, here's a rough outline of how you might parse the data you've shown.

#!/bin/bash    

# methods
analyzeStructuralErrors()
{ 
    local f=$1
    local Xpat="Error Code for Issue X"
    local notXpat="Error Code for Issue [^X]"
    while read -r line
    do
        if [[ $line =~ $Xpat ]]
        then
            flag=true
        elif [[ $line =~ $notXpat ]]
        then
            flag=false
        elif $flag && [[ $line =~ , ]]
        then
            # columns could be overwritten if there are more than one X section
            IFS=, read -ra columns <<< "$line"
        elif $flag && [[ $line =~ - ]]
        then
            issues+=(line)
        else
            echo "unrecognized data line"
            echo "$line"
        fi
    done

    for issue in ${issues[@]}
    do
        IFS=- read -ra array <<< "$line"
        # do something with ${array[0]}, ${array[1]}, etc.
        # or iterate
        for field in ${array[@]}
        do
            # do something with $field
        done
    done
}

# main
find . -name "*_report*.txt" | while read -r f
do
    echo "Processing $f"
    analyzeStructuralErrors "$f"
done

来源：https://stackoverflow.com/questions/4443583/bash-script-that-analyzes-report-files

标签

bash

reporting