Creating a static copy of a web page on UNIX commandline / shell script

问题

I need to create a static copy of a web page (all media resources, like CSS, images and JS included) in a shell script. This copy should be openable offline in any browser.

Some browsers have a similar functionality (Save As... Web Page, complete) which create a folder from a page and rewrite external resources as relative static resources in this folder.

What's a way to accomplish and automatize this on Linux command line to a given URL?

回答1:

You can use wget like this:

wget --recursive --convert-links --domains=example.org http://www.example.org

this command will recursively download any page reachable by hyperlinks from the page at www.example.org not following links outside the example.org domain.

Check wget manual page for more options for controlling recursion.

回答2:

You want the tool wget to mirror a site do:

$ wget -mk http://www.example.com/

Options:

-m --mirror

Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.

-k --convert-links

After the download is complete, convert the links in the document to make them suitable for local viewing. This affects not only the visible hyperlinks, but any part of the document that links to external content, such as embedded images, links to style sheets, hyperlinks to non-HTML content, etc.

来源：https://stackoverflow.com/questions/15849696/creating-a-static-copy-of-a-web-page-on-unix-commandline-shell-script

标签

bash

curl

web-crawler

wget

lynx

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!