Save complete web page (incl css, images) using python/selenium

后端 未结 4 1949
一个人的身影
一个人的身影 2020-12-14 17:43

I am using Python/Selenium to submit genetic sequences to an online database, and want to save the full page of results I get back. Below is the code that gets me to the res

4条回答
  •  自闭症患者
    2020-12-14 18:24

    Inspired by FThompson's answer above, I came up with the following tool that can download full/complete html for a given page url (see: https://github.com/markfront/SinglePageFullHtml)

    UPDATE - follow up with Max's suggestion, below are steps to use the tool:

    1. Clone the project, then run maven to build:
    $> git clone https://github.com/markfront/SinglePageFullHtml.git
    
    $> cd ~/git/SinglePageFullHtml
    $> mvn clean compile package
    
    1. Find the generated jar file in target folder: SinglePageFullHtml-1.0-SNAPSHOT-jar-with-dependencies.jar

    2. Run the jar in command line like:

    $> java -jar .target/SinglePageFullHtml-1.0-SNAPSHOT-jar-with-dependencies.jar 
    
    1. The result file name will have a prefix "FP, followed by the hashcode of the page url, with file extension ".html". It will be found in either folder "/tmp" (which you can get by System.getProperty("java.io.tmp"). If not, try find it in your home dir or System.getProperty("user.home") in Java).

    2. The result file will be a big fat self-contained html file that includes everything (css, javascript, images, etc.) referred to by the original html source.

提交回复
热议问题