Why piping to the same file doesn't work on some platforms?

风格不统一 提交于 2019-12-18 08:25:59

问题


In cygwin, the following code works fine

$ cat junk
bat
bat
bat

$ cat junk | sort -k1,1 |tr 'b' 'z' > junk

$ cat junk
zat
zat
zat

But in the linux shell(GNU/Linux), it seems that overwriting doesn't work

[41] othershell: cat junk
cat
cat
cat
[42] othershell: cat junk |sort -k1,1 |tr 'c' 'z'
zat
zat
zat
[43] othershell: cat junk |sort -k1,1 |tr 'c' 'z' > junk
[44] othershell: cat junk

Both environments run BASH.

I am asking this because sometimes after I do text manipulation, because of this caveat, I am forced to make the tmp file. But I know in Perl, you can give "i" flag to overwrite the original file after some operations/manipulations. I just want to ask if there is any foolproof method in unix pipeline to overwrite the file that I am not aware of.


回答1:


In general this can be expected to break. The processes in a pipeline are all started up in parallel, so the > junk at the end of the line will usually truncate your input file before the process at the head of the pipelining has finished (or even started) reading from it.

Even if bash under Cygwin let's you get away with this you shouldn't rely on it. The general solution is to redirect to a temporary file and then rename it when the pipeline is complete.




回答2:


Four main points here:

  1. "Useless use of cat." Don't do that.
  2. You're not actually sorting anything with sort. Don't do that.
  3. Your pipeline doesn't say what you think it does. Don't do that.
  4. You're trying to over-write a file in-place while reading from it. Don't do that.

One of the reasons you are getting inconsistent behavior is that you are piping to a process that has redirection, rather than redirecting the output of the pipeline as a whole. The difference is subtle, but important.

What you want is to create a compound command with Command Grouping, so that you can redirect the input and output of the whole pipeline. In your case, this should work properly:

{ sort -k1,1 | tr 'c' 'z'; } < junk > sorted_junk

Please note that without anything to sort, you might as well skip the sort command too. Then your command can be run without the need for command grouping:

tr 'c' 'z' < junk > sorted_junk

Keep redirections and pipelines as simple as possible. It makes debugging your scripts much easier.

However, if you still want to abuse the pipeline for some reason, you could use the sponge utility from the moreutils package. The man page says:

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

So, your original command line can be re-written like this:

cat junk | sort -k1,1 | tr 'c' 'z' | sponge junk

and since junk will not be overwritten until sponge receives EOF from the pipeline, you will get the results you were expecting.




回答3:


You want to edit that file, you can just use the editor.

ex junk << EOF
%!(sort -k1,1 |tr 'b' 'z')
x
EOF



回答4:


Overriding the same file in pipeline is not advice, because when you do the mistake you can't get it back (unless you've the backup or it's the under version control).

This happens, because the input and output in pipeline is automatically buffered (which gives you an impression it works), but it actually it's running in parallel. Different platforms could buffer the output in different way (based on the settings), so on some you end up with empty file (because the file would be created at the start), on some other with half-finished file.

The solution is to use some method when the file is only overridden when it encounters an EOF with full buffered and processed input.

This can be achieved by:

  • Using utility which can soaks up all its input before opening the output file.

    This can either be done by sponge (as opposite of unbuffer from expect package).

  • Avoid using I/O redirection syntax (which can create the empty file before starting the command).

    For example using tee (which buffers its standard streams), for example:

    cat junk | sort | tee junk
    

    This would only work with sort, because it expects all the input to process the sorting. So if your command doesn't use sort, add one.

    Another tool which can be used is stdbuf which modifies buffering operations for its standard streams where you can specify the buffer size.

  • Use text processor which can edit files in-place (such as sed or ex).

    Example:

    $ ex -s +'%!sort -k1' -cxa myfile.txt
    $ sed -i '' s/foo/bar/g myfile.txt
    



回答5:


Using the following simple script, you can make it work like you want to:

$ cat junk | sort -k1,1 |tr 'b' 'z' | overwrite_file.sh junk

overwrite_file.sh

#!/usr/bin/env bash

OUT=$(cat -)

FILENAME="$*"

echo "$OUT" | tee "$FILENAME"

Note that if you don't want the updated file to be send to stdout, you can use this approach instead

overwrite_file_no_output.sh

#!/usr/bin/env bash

OUT=$(cat -)

FILENAME="$*"

echo "$OUT" > "$FILENAME"


来源:https://stackoverflow.com/questions/10586623/why-piping-to-the-same-file-doesnt-work-on-some-platforms

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!