Save complete web page (incl css, images) using python/selenium

后端未结

关注

 4  1970

一个人的身影 2020-12-14 17:43

I am using Python/Selenium to submit genetic sequences to an online database, and want to save the full page of results I get back. Below is the code that gets me to the res

4条回答

自闭症患者 (楼主)

2020-12-14 18:24
Inspired by FThompson's answer above, I came up with the following tool that can download full/complete html for a given page url (see: https://github.com/markfront/SinglePageFullHtml)

UPDATE - follow up with Max's suggestion, below are steps to use the tool:
1. Clone the project, then run maven to build:
```
$> git clone https://github.com/markfront/SinglePageFullHtml.git

$> cd ~/git/SinglePageFullHtml
$> mvn clean compile package
```
1. Find the generated jar file in target folder: SinglePageFullHtml-1.0-SNAPSHOT-jar-with-dependencies.jar
2. Run the jar in command line like:
```
$> java -jar .target/SinglePageFullHtml-1.0-SNAPSHOT-jar-with-dependencies.jar 
```
1. The result file name will have a prefix "FP, followed by the hashcode of the page url, with file extension ".html". It will be found in either folder "/tmp" (which you can get by System.getProperty("java.io.tmp"). If not, try find it in your home dir or System.getProperty("user.home") in Java).
2. The result file will be a big fat self-contained html file that includes everything (css, javascript, images, etc.) referred to by the original html source.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...