How do I save a web page, programatically?

主宰稳场 提交于 2019-12-01 05:49:34

Take a look at wget, specifically the -p flag

−p  −−page−requisites
This option causes Wget to download all the files
that are necessary to properly display
a givenHTML  page. Thisincludes such
things as inlined images, sounds, and
referenced stylesheets.

The following command:

wget -p http://<site>/1.html

Will download page.html and all files it requires.

On Windows: you can run IE as a com object and pull everything out.

On other thing, you can take the source of Mozilla.

In Java, Lobo.

Or commons-httpclient and write a lot of code.

You could try the MHTML format (which is what IE uses). http://en.wikipedia.org/wiki/MHTML

In other words, you'd be downloading each object (image, css, etc.) to your computer, and then "embedding" them, via Base64, into a single file.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!