Return html code of dynamic page using selenium

偶尔善良 提交于 2021-01-29 03:10:16

问题


I'm trying to crawl this website, problem is it's dynamically loaded.

Basically I want what I can see from the browser console, not what I see when I right click > show sources.

I've tried some selenium examples but I can't get what I need. The code below uses selenium and get only what you get in right click -> show code. How can I get the content of the loaded page?

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium import webdriver
import time

# Start the WebDriver and load the page
wd = webdriver.Firefox()
wd.get("https://www.leforem.be/particuliers/offres-emploi-recherche-par-criteres.html?exParfullText=&exPar_search_=true&    exParGeographyEdi=true")

# Wait for the dynamically loaded elements to show up
time.sleep(5)

# And grab the page HTML source
html_page = wd.page_source
wd.quit()

# Now you can use html_page as you like

print(html_page)

回答1:


You need to explicitly wait for the search results to appear before getting the page source:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


wd = webdriver.Firefox()
wd.get("https://www.leforem.be/particuliers/offres-emploi-recherche-par-criteres.html?exParfullText=&exPar_search_=true&    exParGeographyEdi=true")

wd.switch_to.frame("cible")

wait = WebDriverWait(wd, 10)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'td.resultatIntitule')))

print(wd.page_source)


来源:https://stackoverflow.com/questions/30891621/return-html-code-of-dynamic-page-using-selenium

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!