Why does wget only download the index.html for some websites?

后端 未结 8 926
陌清茗
陌清茗 2020-12-12 13:29

I\'m trying to use wget command:

wget -p http://www.example.com 

to fetch all the files on the main page. For some websites it works but i

8条回答
  •  暖寄归人
    2020-12-12 14:05

    If you only get the index.html and that file looks like it only contains binary data (i.e. no readable text, only control characters), then the site is probably sending the data using gzip compression.

    You can confirm this by running cat index.html | gunzip to see if it outputs readable HTML.

    If this is the case, then wget's recursive feature (-r) won't work. There is a patch for wget to work with gzip compressed data, but it doesn't seem to be in the standard release yet.

提交回复
热议问题