bash looping and extracting of the fragment of txt file

后端 未结 3 1696
长情又很酷
长情又很酷 2021-01-23 13:26

I am dealing with the analysis of big number of dlg text files located within the workdir. Each file has a table (usually located in different positions of the log) in the follo

3条回答
  •  没有蜡笔的小新
    2021-01-23 13:51

    Probably makes more sense as an Awk script.

    This picks the first line with the widest histogram in the case of a tie within an input file.

    #!/bin/bash
    
    awk 'FNR == 1 { if(sel) print sel; sel = ""; max = 0 }
       FNR < 9 { next }
       length($10) > max { max = length($10); sel = FILENAME ":" $0 }
       END { if (sel) print sel }' ./"$prot"/*.dlg
    

    This assumes the histograms are always the tenth field; if your input format is even messier than the lump you show, maybe adapt to taste.

    In some more detail, the first line triggers on the first line of each input file. If we have collected a previous line (meaning this is not the first input file), print that, and start over. Otherwise, initialize for the first input file. Set sel to nothing and max to zero.

    The second line skips lines 1-8 which contain the header.

    The third line checks if the current line's histogram is longer than max. If it is, update max to this histogram's length, and remember the current line in sel.

    The last line is spillover for when we have processed all files. We never printed the sel from the last file, so print that too, if it's set.

    If you mean to say we should find the lines between CLUSTERING HISTOGRAM and the end of the table, we should probably have more information about what the surrounding lines look like. Maybe something like this, though;

    awk '/CLUSTERING HISTOGRAM/ { if (sel) print sel; looking = 1; sel = ""; max = 0 }
       !looking { next }
       looking > 1 && $1 != looking { looking = 0; nextfile }
       $1 == looking && length($10) > max { max = length($10); sel = FILENAME ":" $0 }
       END { if (sel) print sel }' ./"$prot"/*.dlg
    

    This sets looking to 1 when we see CLUSTERING HISTOGRAM, then counts up to the first line where looking is no longer increasing.

提交回复
热议问题