xargs: losing output when redirecting stdout to a file in parallel mode

故事扮演 提交于 2019-12-18 17:06:13

问题


I am using GNU xargs (version 4.2.2) in parallel mode and I seem to be reliably losing output when redirecting to a file. When redirecting to a pipe, it appears to work correctly.

The following shell commands demonstrates a minimum, complete, and verifiable example of the issue. I generate 2550 numbers using xargs to split it into lines of 100 args each totalling 26 lines where the 26th line contains only 50 args.

# generate numbers 1 to 2550 where each number is on its own line
$ seq 1 2550 > /tmp/nums
$ wc -l /tmp/nums
2550 /tmp/nums

# piping to wc is accurate: 26 lines, 2550 args
$ xargs -P20 -n 100 </tmp/nums | wc
     26    2550   11643

# redirecting to a file is clearly inaccurate: 22 lines, 2150 args
$ xargs -P20 -n 100 </tmp/nums >/tmp/out; wc /tmp/out
     22  2150 10043 /tmp/out

I believe the problem is not related to the underlying shell since the shell will perform the redirection before the commands execute and wait for xargs to complete. In this case, I hypothesize xargs is completing before flushing the buffer. However if my hypothesis is correct, I do not know why this problem doesn't manifest when writing to a pipe.

Edit:

It appears when using >> (create/append to file) in the shell, the problem doesn't seem to manifest:

# appending to file
$ >/tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
     26    2550   11643

# creating and appending to file
$ rm /tmp/out
$ xargs -P20 -n 100 </tmp/nums >>/tmp/out; wc /tmp/out
     26    2550   11643

回答1:


Your problem is due to the output from different processes being mixed. It is shown here:

parallel perl -e '\$a=\"1{}\"x10000000\;print\ \$a,\"\\n\"' '>' {} ::: a b c d e f
ls -l a b c d e f
parallel -kP4 -n1 grep 1 > out.par ::: a b c d e f
echo a b c d e f | xargs -P4 -n1 grep 1 > out.xargs-unbuf
echo a b c d e f | xargs -P4 -n1 grep --line-buffered 1 > out.xargs-linebuf
echo a b c d e f | xargs -n1 grep 1 > out.xargs-serial
ls -l out*
md5sum out*

The solution is to buffer the output from each job - either in memory or in tmpfiles (like GNU Parallel does).




回答2:


I know this question is about xargs, but if you keep on having issues with it, then perhaps GNU Parallel may be of help. Your xargs invocation would translate to:

$ < /tmp/nums parallel -j20 -N100 echo > /tmp/out; wc /tmp/out
26  2550 11643 /tmp/out


来源:https://stackoverflow.com/questions/32450489/xargs-losing-output-when-redirecting-stdout-to-a-file-in-parallel-mode

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!