I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in different positions of the log) in the follo
Probably makes more sense as an Awk script.
This picks the first line with the widest histogram in the case of a tie within an input file.
#!/bin/bash
awk 'FNR == 1 { if(sel) print sel; sel = ""; max = 0 }
FNR < 9 { next }
length($10) > max { max = length($10); sel = FILENAME ":" $0 }
END { if (sel) print sel }' ./"$prot"/*.dlg
This assumes the histograms are always the tenth field; if your input format is even messier than the lump you show, maybe adapt to taste.
In some more detail, the first line triggers on the first line of each input file. If we have collected a previous line (meaning this is not the first input file), print that, and start over. Otherwise, initialize for the first input file. Set sel
to nothing and max
to zero.
The second line skips lines 1-8 which contain the header.
The third line checks if the current line's histogram is longer than max
. If it is, update max
to this histogram's length, and remember the current line in sel
.
The last line is spillover for when we have processed all files. We never printed the sel
from the last file, so print that too, if it's set.
If you mean to say we should find the lines between CLUSTERING HISTOGRAM
and the end of the table, we should probably have more information about what the surrounding lines look like. Maybe something like this, though;
awk '/CLUSTERING HISTOGRAM/ { if (sel) print sel; looking = 1; sel = ""; max = 0 }
!looking { next }
looking > 1 && $1 != looking { looking = 0; nextfile }
$1 == looking && length($10) > max { max = length($10); sel = FILENAME ":" $0 }
END { if (sel) print sel }' ./"$prot"/*.dlg
This sets looking
to 1 when we see CLUSTERING HISTOGRAM
, then counts up to the first line where looking
is no longer increasing.