Parallel download using Curl command line utility

后端 未结 8 659
抹茶落季
抹茶落季 2020-12-13 19:08

I want to download some pages from a website and I did it successfully using curl but I was wondering if somehow curl downloads multiple pages at a

相关标签:
8条回答
  • 2020-12-13 19:48

    I am not sure about curl, but you can do that using wget.

    wget \
         --recursive \
         --no-clobber \
         --page-requisites \
         --html-extension \
         --convert-links \
         --restrict-file-names=windows \
         --domains website.org \
         --no-parent \
             www.website.org/tutorials/html/
    
    0 讨论(0)
  • 2020-12-13 19:50

    For launching of parallel commands, why not use the venerable make command line utility.. It supports parallell execution and dependency tracking and whatnot.

    How? In the directory where you are downloading the files, create a new file called Makefile with the following contents:

    # which page numbers to fetch
    numbers := $(shell seq 1 10)
    
    # default target which depends on files 1.html .. 10.html
    # (patsubst replaces % with %.html for each number)
    all: $(patsubst %,%.html,$(numbers))
    
    # the rule which tells how to generate a %.html dependency
    # $@ is the target filename e.g. 1.html
    %.html:
            curl -C - 'http://www...../?page='$(patsubst %.html,%,$@) -o $@.tmp
            mv $@.tmp $@
    

    NOTE The last two lines should start with a TAB character (instead of 8 spaces) or make will not accept the file.

    Now you just run:

    make -k -j 5
    

    The curl command I used will store the output in 1.html.tmp and only if the curl command succeeds then it will be renamed to 1.html (by the mv command on the next line). Thus if some download should fail, you can just re-run the same make command and it will resume/retry downloading the files that failed to download during the first time. Once all files have been successfully downloaded, make will report that there is nothing more to be done, so there is no harm in running it one extra time to be "safe".

    (The -k switch tells make to keep downloading the rest of the files even if one single download should fail.)

    0 讨论(0)
提交回复
热议问题