Using wget to recursively fetch a directory with arbitrary files in it

前端 未结 14 2040
再見小時候
再見小時候 2020-11-27 08:31

I have a web directory where I store some config files. I\'d like to use wget to pull those files down and maintain their current structure. For instance, the remote directo

相关标签:
14条回答
  • 2020-11-27 09:08

    You have to pass the -np/--no-parent option to wget (in addition to -r/--recursive, of course), otherwise it will follow the link in the directory index on my site to the parent directory. So the command would look like this:

    wget --recursive --no-parent http://example.com/configs/.vim/
    

    To avoid downloading the auto-generated index.html files, use the -R/--reject option:

    wget -r -np -R "index.html*" http://example.com/configs/.vim/
    
    0 讨论(0)
  • 2020-11-27 09:08

    Recursive wget ignoring robots (for websites)

    wget -e robots=off -r -np --page-requisites --convert-links 'http://example.com/folder/'
    

    -e robots=off causes it to ignore robots.txt for that domain

    -r makes it recursive

    -np = no parents, so it doesn't follow links up to the parent folder

    0 讨论(0)
  • 2020-11-27 09:10

    To download a directory recursively, which rejects index.html* files and downloads without the hostname, parent directory and the whole directory structure :

    wget -r -nH --cut-dirs=2 --no-parent --reject="index.html*" http://mysite.com/dir1/dir2/data
    
    0 讨论(0)
  • 2020-11-27 09:10

    For anyone else that having similar issues. Wget follows robots.txt which might not allow you to grab the site. No worries, you can turn it off:

    wget -e robots=off http://www.example.com/
    

    http://www.gnu.org/software/wget/manual/html_node/Robot-Exclusion.html

    0 讨论(0)
  • 2020-11-27 09:12

    The following option seems to be the perfect combination when dealing with recursive download:

    wget -nd -np -P /dest/dir --recursive http://url/dir1/dir2

    Relevant snippets from man pages for convenience:

       -nd
       --no-directories
           Do not create a hierarchy of directories when retrieving recursively.  With this option turned on, all files will get saved to the current directory, without clobbering (if a name shows up more than once, the
           filenames will get extensions .n).
    
    
       -np
       --no-parent
           Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded.
    
    0 讨论(0)
  • 2020-11-27 09:14

    To fetch a directory recursively with username and password, use the following command:

    wget -r --user=(put username here) --password='(put password here)' --no-parent http://example.com/
    
    0 讨论(0)
提交回复
热议问题