How do you archive an entire website for offline viewing?

房东的猫 提交于 2019-12-02 14:00:56

In Windows, you can look at HTTrack. It's very configurable allowing you to set the speed of the downloads. But you can just point it at a website and run it too with no configuration at all.

In my experience it's been a really good tool and works well. Some of the things I like about HTTrack are:

  • Open Source license
  • Resumes stopped downloads
  • Can update an existing archive
  • You can configure it to be non-aggressive when it downloads so it doesn't waste your bandwidth and the bandwidth of the site.
chuckg

You could use wget:

wget -m -k -K -E http://url/of/web/site

The Wayback Machine Downloader by hartator is simple and fast.

Install via Ruby, then run with the desired domain and optional timestamp from the Internet Archive.

sudo gem install wayback_machine_downloader
mkdir example
cd example
wayback_machine_downloader http://example.com --timestamp 19700101000000

I use Blue Crab on OSX and WebCopier on Windows.

wget -r -k

... and investigate the rest of the options. I hope you've followed these guidelines:http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html so all your resources are safe with GET requests.

I just use: wget -m <url>.

Dieghito

If your customers are archiving for compliance issues, you want to ensure that the content can be authenticated. The options listed are fine for simple viewing, but they aren't legally admissible. In that case, you're looking for timestamps and digital signatures. Much more complicated if you're doing it yourself. I'd suggest a service such as PageFreezer.

For OS X users, I've found the sitesucker application found here works well without configuring anything but how deep it follows links.

I've been using HTTrack for several years now. It handles all of the inter-page linking, etc. just fine. My only complaint is that I haven't found a good way to keep it limited to a sub-site very well. For instance, if there is a site www.foo.com/steve that I want to archive, it will likely follow links to www.foo.com/rowe and archive that too. Otherwise it's great. Highly configurable and reliable.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!