Why piping to the same file doesn't work on some platforms?

后端 未结 5 1628
谎友^
谎友^ 2020-12-20 22:22

In cygwin, the following code works fine

$ cat junk
bat
bat
bat

$ cat junk | sort -k1,1 |tr \'b\' \'z\' > junk

$ cat junk
zat
zat
zat

5条回答
  •  爱一瞬间的悲伤
    2020-12-20 23:00

    Overriding the same file in pipeline is not advice, because when you do the mistake you can't get it back (unless you've the backup or it's the under version control).

    This happens, because the input and output in pipeline is automatically buffered (which gives you an impression it works), but it actually it's running in parallel. Different platforms could buffer the output in different way (based on the settings), so on some you end up with empty file (because the file would be created at the start), on some other with half-finished file.

    The solution is to use some method when the file is only overridden when it encounters an EOF with full buffered and processed input.

    This can be achieved by:

    • Using utility which can soaks up all its input before opening the output file.

      This can either be done by sponge (as opposite of unbuffer from expect package).

    • Avoid using I/O redirection syntax (which can create the empty file before starting the command).

      For example using tee (which buffers its standard streams), for example:

      cat junk | sort | tee junk
      

      This would only work with sort, because it expects all the input to process the sorting. So if your command doesn't use sort, add one.

      Another tool which can be used is stdbuf which modifies buffering operations for its standard streams where you can specify the buffer size.

    • Use text processor which can edit files in-place (such as sed or ex).

      Example:

      $ ex -s +'%!sort -k1' -cxa myfile.txt
      $ sed -i '' s/foo/bar/g myfile.txt
      

提交回复
热议问题