bash: process list of files in chunks

北城余情 提交于 2019-12-01 17:51:22

You can do:

i=0
opfiles=
mkfifo /tmp/foo
echo *dat | xargs -n 3 >/tmp/foo&
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles
    opfiles="$opfiles tmp_output$i.dat"
    ((i++)) 
done </tmp/foo
rm -f /tmp/foo
wait
./cmd output.dat $opfiles
rm $opfiles

You need to use a fifo to keep the i variable value, as well as for the final concatenation set of files.

If you want, you can background the inside invocation of ./cmd, put a wait before the last invocation of cmd:

i=0
opfiles=
mkfifo /tmp/foo
echo *dat | xargs -n 3 >/tmp/foo&
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles&
    opfiles="$opfiles tmp_output$i.dat"
    ((i++)) 
done </tmp/foo
rm -f /tmp/foo
wait
./cmd output.dat $opfiles
rm $opfiles

update If you want to avoid using a fifo entirely, you can use process substitution to emulate it, so rewriting the first one as:

i=0
opfiles=()
while read threefiles; do
    ./cmd tmp_output$i.dat $threefiles
    opfiles+=("tmp_output$i.dat")
    ((i++)) 
done < <(echo *dat | xargs -n 3)
./cmd output.dat "${opfiles[@]}"
rm "${opfiles[@]}"

Again avoiding piping into the while, but reading from a redirection to keep the opfiles variable after the while loop.

Try the following, it should work for you:

echo *dat | xargs -n3 ./cmd output.dat

EDIT: In response to your comment:

for i in {0..9}; do
    echo file${i}*.dat | xargs -n3 ./cmd output${i}.dat
done

That would send no more than three files at a time to ./cmd, while going over all file from file00.dat to file99.dat, and having 10 different output files, output1.dat to output9.dat.

I know that this question was answered and accepted a long time ago, but I find that there is a more simple solution than those offered so far.

find -name '*.dat' | xargs -n3 | xargs -n3 your_command

For more fine grained control, or to manipulate your string further, use the following form (substitute bash to your liking):

find -name '*.dat' | xargs -n3 | xargs -n3 -I{} sh -c 'your_command {}'

To parallelize the output (say, on 2 threads):

find -name '*.dat' | xargs -n3 | xargs -P2 -n3 -I{} sh -c 'your_command {}'

NOTE: This will not work for files that have spaces in them.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!