How to download all files (but not HTML) from a website using wget?

前端 未结 8 998
生来不讨喜
生来不讨喜 2020-11-29 14:19

How to use wget and get all the files from website?

I need all files except the webpage files like HTML, PHP, ASP etc.

8条回答
  •  夕颜
    夕颜 (楼主)
    2020-11-29 15:11

    You may try:

    wget --user-agent=Mozilla --content-disposition --mirror --convert-links -E -K -p http://example.com/
    

    Also you can add:

    -A pdf,ps,djvu,tex,doc,docx,xls,xlsx,gz,ppt,mp4,avi,zip,rar
    

    to accept the specific extensions, or to reject only specific extensions:

    -R html,htm,asp,php
    

    or to exclude the specific areas:

    -X "search*,forum*"
    

    If the files are ignored for robots (e.g. search engines), you've to add also: -e robots=off

提交回复
热议问题