Parallel download using Curl command line utility

后端 未结 8 663
抹茶落季
抹茶落季 2020-12-13 19:08

I want to download some pages from a website and I did it successfully using curl but I was wondering if somehow curl downloads multiple pages at a

8条回答
  •  伪装坚强ぢ
    2020-12-13 19:25

    My answer is a bit late, but I believe all of the existing answers fall just a little short. The way I do things like this is with xargs, which is capable of running a specified number of commands in subprocesses.

    The one-liner I would use is, simply:

    $ seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
    

    This warrants some explanation. The use of -n 1 instructs xargs to process a single input argument at a time. In this example, the numbers 1 ... 10 are each processed separately. And -P 2 tells xargs to keep 2 subprocesses running all the time, each one handling a single argument, until all of the input arguments have been processed.

    You can think of this as MapReduce in the shell. Or perhaps just the Map phase. Regardless, it's an effective way to get a lot of work done while ensuring that you don't fork bomb your machine. It's possible to do something similar in a for loop in a shell, but end up doing process management, which starts to seem pretty pointless once you realize how insanely great this use of xargs is.

    Update: I suspect that my example with xargs could be improved (at least on Mac OS X and BSD with the -J flag). With GNU Parallel, the command is a bit less unwieldy as well:

    parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}
    

提交回复
热议问题