How to download all files (but not HTML) from a website using wget?

前端未结

关注

 8  991

生来不讨喜

How to use wget and get all the files from website?

I need all files except the webpage files like HTML, PHP, ASP etc.

相关标签:

8条回答

醉梦人生

2020-11-29 14:48
This downloaded the entire website for me:
```
wget --no-clobber --convert-links --random-wait -r -p -E -e robots=off -U mozilla http://site/path/
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人及你

2020-11-29 14:53
I was trying to download zip files linked from Omeka's themes page - pretty similar task. This worked for me:
```
wget -A zip -r -l 1 -nd http://omeka.org/add-ons/themes/
```
- -A: only accept zip files
- -r: recurse
- -l 1: one level deep (ie, only files directly linked from this page)
- -nd: don't create a directory structure, just download all the files into this directory.
All the answers with -k, -K, -E etc options probably haven't really understood the question, as those as for rewriting HTML pages to make a local structure, renaming .php files and so on. Not relevant.

To literally get all files except .html etc:
```
wget -R html,htm,php,asp,jsp,js,py,css -r -l 1 -nd http://yoursite.com
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-11-29 14:58
On Windows systems in order to get wget you may
1. download Cygwin
2. download GnuWin32
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-11-29 15:03
Try this. It always works for me
```
wget --mirror -p --convert-links -P ./LOCAL-DIR WEBSITE-URL
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2020-11-29 15:06
To filter for specific file extensions:
```
wget -A pdf,jpg -m -p -E -k -K -np http://site/path/
```
Or, if you prefer long option names:
```
wget --accept pdf,jpg --mirror --page-requisites --adjust-extension --convert-links --backup-converted --no-parent http://site/path/
```
This will mirror the site, but the files without jpg or pdf extension will be automatically removed.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2020-11-29 15:08
```
wget -m -p -E -k -K -np http://site/path/
```
man page will tell you what those options do.

wget will only follow links, if there is no link to a file from the index page, then wget will not know about its existence, and hence not download it. ie. it helps if all files are linked to in web pages or in directory indexes.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页