How to get innerHTML of whole page in selenium driver?

前端 未结 3 372
予麋鹿
予麋鹿 2020-12-10 10:40

I\'m using selenium to click to the web page I want, and then parse the web page using Beautiful Soup.

Somebody has shown how to get inner

相关标签:
3条回答
  • 2020-12-10 10:52

    To get the HTML for the whole page:

    from selenium import webdriver
    
    driver = webdriver.Firefox()
    driver.get("http://stackoverflow.com")
    
    html = driver.page_source
    

    To get the outer HTML (tag included):

    # HTML from `<html>`
    html = driver.execute_script("return document.documentElement.outerHTML;")
    
    # HTML from `<body>`
    html = driver.execute_script("return document.body.outerHTML;")
    
    # HTML from element with some JavaScript
    element = driver.find_element_by_css_selector("#hireme")
    html = driver.execute_script("return arguments[0].outerHTML;", element)
    
    # HTML from element with `get_attribute`
    element = driver.find_element_by_css_selector("#hireme")
    html = element.get_attribute('outerHTML')
    

    To get the inner HTML (tag excluded):

    # HTML from `<html>`
    html = driver.execute_script("return document.documentElement.innerHTML;")
    
    # HTML from `<body>`
    html = driver.execute_script("return document.body.innerHTML;")
    
    # HTML from element with some JavaScript
    element = driver.find_element_by_css_selector("#hireme")
    html = driver.execute_script("return arguments[0].innerHTML;", element)
    
    # HTML from element with `get_attribute`
    element = driver.find_element_by_css_selector("#hireme")
    html = element.get_attribute('innerHTML')
    
    0 讨论(0)
  • 2020-12-10 10:56

    driver.page_source probably outdated. Following worked for me

    let html = await driver.getPageSource();
    

    Reference: https://seleniumhq.github.io/selenium/docs/api/javascript/module/selenium-webdriver/ie_exports_Driver.html#getPageSource

    0 讨论(0)
  • 2020-12-10 11:12

    Using page object:

    @FindBy(xpath = "xapth")
    private WebElement element;
    
    public String getInnnerHtml() {
        System.out.println(waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML"));
        return waitUntilElementToBeClickable(element, 10).getAttribute("innerHTML")
    }
    
    0 讨论(0)
提交回复
热议问题