Using wget to recursively fetch a directory with arbitrary files in it

前端 未结 14 2042
再見小時候
再見小時候 2020-11-27 08:31

I have a web directory where I store some config files. I\'d like to use wget to pull those files down and maintain their current structure. For instance, the remote directo

相关标签:
14条回答
  • 2020-11-27 09:17

    All you need is two flags, one is "-r" for recursion and "--no-parent" (or -np) in order not to go in the '.' and ".." . Like this:

    wget -r --no-parent http://example.com/configs/.vim/

    That's it. It will download into the following local tree: ./example.com/configs/.vim . However if you do not want the first two directories, then use the additional flag --cut-dirs=2 as suggested in earlier replies:

    wget -r --no-parent --cut-dirs=2 http://example.com/configs/.vim/

    And it will download your file tree only into ./.vim/

    In fact, I got the first line from this answer precisely from the wget manual, they have a very clean example towards the end of section 4.3.

    0 讨论(0)
  • 2020-11-27 09:18

    Wget 1.18 may work better, e.g., I got bitten by a version 1.12 bug where...

    wget --recursive (...)
    

    ...only retrieves index.html instead of all files.

    Workaround was to notice some 301 redirects and try the new location — given the new URL, wget got all the files in the directory.

    0 讨论(0)
  • 2020-11-27 09:23

    Here's the complete wget command that worked for me to download files from a server's directory (ignoring robots.txt):

    wget -e robots=off --cut-dirs=3 --user-agent=Mozilla/5.0 --reject="index.html*" --no-parent --recursive --relative --level=1 --no-directories http://www.example.com/archive/example/5.3.0/
    
    0 讨论(0)
  • 2020-11-27 09:25

    If --no-parent not help, you might use --include option.

    Directory struct:

    http://<host>/downloads/good
    http://<host>/downloads/bad
    

    And you want to download downloads/good but not downloads/bad directory:

    wget --include downloads/good --mirror --execute robots=off --no-host-directories --cut-dirs=1 --reject="index.html*" --continue http://<host>/downloads/good
    
    0 讨论(0)
  • 2020-11-27 09:25
    wget -r http://mysite.com/configs/.vim/
    

    works for me.

    Perhaps you have a .wgetrc which is interfering with it?

    0 讨论(0)
  • 2020-11-27 09:28

    You should use the -m (mirror) flag, as that takes care to not mess with timestamps and to recurse indefinitely.

    wget -m http://example.com/configs/.vim/
    

    If you add the points mentioned by others in this thread, it would be:

    wget -m -e robots=off --no-parent http://example.com/configs/.vim/
    
    0 讨论(0)
提交回复
热议问题