How to handle 3 files with awk?

夙愿已清 提交于 2019-12-03 08:46:18
mklement0

Update: The solution below works, as long as all input files are nonempty, but see @Ed Morton's answer for a simpler and more robust way of adding file-specific handling.

However, this answer still provides a hopefully helpful explanation of some awk basics and why the OP's approach didn't work.


Try the following (note that I've made the indices 1-based, as that's how awk does it):

awk '

 # Increment the current-file index, if a new file is being processed.
 FNR == 1 { ++fIndex }

 # Process current line if from 1st file.
 fIndex == 1 {
    print "file 1: " FILENAME
    next
 }

 # Process current line if from 2nd file.
 fIndex == 2 {
    print "file 2: " FILENAME
    next
 }

 # Process current line (from all remaining files).
 {
    print "file " fIndex ": " FILENAME
 }

' file-1 file-2 file-3
  • Pattern FNR==1 is true whenever a new input file is starting to get processed (FNR contains the input file-relative line number).
  • Every time a new file starts processing, fIndexis incremented and thus reflects the 1-based index of the current input file. Tip of the hat to @twalberg's helpful answer.

    • Note that an uninitialized awk variable used in a numeric context defaults to 0, so there's no need to initialize fIndex (unless you want a different start value).
  • Patterns such as fIndex == 1 can then be used to execute blocks for lines from a specific input file only (assuming the block ends in next).
  • The last block is then executed for all input files that don't have file-specific blocks (above).

As for why your approach didn't work:

  • Your 2nd and 3rd blocks are potentially executed unconditionally, for lines from all input files, because they are not preceded by a pattern (condition).

  • So your 2nd block is entered for lines from all subsequent input files, and its next statement then prevents the 3rd block from ever getting reached.

Potential misconceptions:

  • Perhaps you think that each block functions as a loop processing a single input file. This is NOT how awk works. Instead, the entire awk program is processed in a loop, with each iteration processing a single input line, starting with all lines from file 1, then from file 2, ...

  • An awk program can have any number of blocks (typically preceded by patterns), and whether they're executed for the current input line is solely governed by whether the pattern evaluates to true; if there is no pattern, the block is executed unconditionally (across input files). However, as you've already discovered, next inside a block can be used to skip subsequent blocks (pattern-block pairs).

If you have gawk, just test ARGIND:

awk '
ARGIND == 1 { do file 1 stuff; next }
ARGIND == 2 { do file 2 stuff; next }
' file1 file2

If you don't have gawk, get it.

In other awks though you can just test for the file name:

awk '
FILENAME == ARGV[1] { do file 1 stuff; next }
FILENAME == ARGV[2] { do file 2 stuff; next }
' file1 file2

That only fails if you want to parse the same file twice, if that's the case you need to add a count of the number of times that file's been opened.

Perhaps you need to consider adding some additional structure like this:

BEGIN { file_number=1 }
FNR==1 { ++file_number }
file_number==3 && /something_else/ { ...}
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!